

# Analog Circuit Design

High-speed Clock and Data Recovery,  
High-performance Amplifiers,  
Power Management

*Edited by*

Michiel Steyaert

Arthur H.M. van Roermund

Herman Casier



Springer

# Analog Circuit Design

Michiel Steyaert · Arthur H. M. van Roermund ·  
Herman Casier  
Editors

# Analog Circuit Design

High-speed Clock and Data Recovery,  
High-performance Amplifiers,  
Power Management



*Editors*

Prof.dr.ir. Michiel Steyaert  
Katholieke Universiteit Leuven  
Dept. Electrical Engineering  
(ESAT)  
Kasteelpark Arenberg 10  
3001 Leuven  
Belgium

Prof.dr.ir. Arthur H.M. van Roermund  
Eindhoven University of Technology  
Dept. of Electrical Engineering  
5600 MB Eindhoven  
The Netherlands  
[a.h.m.v.roermund@tue.nl](mailto:a.h.m.v.roermund@tue.nl)

Ir. Herman Casier  
Avondster 6  
8520 Kuurne  
Belgium  
[herman\\_casier@ieee.org](mailto:herman_casier@ieee.org)

ISBN: 978-1-4020-8943-5

e-ISBN: 978-1-4020-8944-2

Library of Congress Control Number: 2008934593

© Springer Science+Business Media B.V. 2009

No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.

Printed on acid-free paper

9 8 7 6 5 4 3 2 1

[springer.com](http://springer.com)

# Preface

This book is part of the Analog Circuit Design series and contains the revised contributions of all speakers of the 17th workshop on Advances in Analog Circuit Design (AACD), which was organized by Andrea Baschirotto and Piero Malcovati of the University of Pavia. This year it was held at the University of Pavia in the magnificent auditoria “Aula Volta”.

The book contains the contribution of 18 tutorials, divided in three chapters, each discussing a specific to-date topic on new and valuable design ideas in the area of analog circuit design. Each part is presented by six experts in that field and state-of-the-art information is shared and overviewed. The topics of 2008 are:

- High-speed Clock and Data Recovery
- High-performance Amplifiers
- Power Management

The aim of the AACD workshop is to bring together a group of expert designers to study and discuss new possible and future developments in the area of analog circuit design. Each AACD workshop has given rise to the publication of a book by Springer in their successful series of Analog Circuit Design. This book is number 17 in this series. These books can be seen as a reference to those people involved in analog and mixed signal design. The full list of the previous books and topics in the series is enclosed below.

We sincerely hope that this 17th book is an added value in this series and provides a valuable contributions to our Analog Circuit Design community.

Michiel Steyaert

**Table** other topics covered before in this series:

|      |                                |                                                                                                                                                      |
|------|--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2007 | Oostende (Belgium)             | Sensors, Actuators and Power Drivers for the Automotive and Industrial Environment Integrated PAs from Wireline to RF Very High Frequency Front Ends |
| 2006 | Maastricht (The Netherlands)   | High-speed AD converters Automotive Electronics: EMC issues Ultra Low Power Wireless                                                                 |
| 2005 | Limerick (Ireland)             | RF Circuits: Wide Band, Front-Ends, DACs Design Methodology and Verification of RF and Mixed-Signal Systems Low Power and Low Voltage                |
| 2004 | Montreux (Swiss)               | Sensor and Actuator Interface Electronics Integrated High-Voltage Electronics and Power Management Low-Power and High-Resolution ADCs                |
| 2003 | Graz (Austria)                 | Fractional-N Synthesizers Design for Robustness Line and Bus drivers                                                                                 |
| 2002 | Spa (Belgium)                  | Structured Mixed-Mode Design Multi-Bit Sigma-Delta Converters Short-Range RF Circuits                                                                |
| 2001 | Noordwijk (The Netherlands)    | Scalable Analog Circuits High-Speed D/A Converters RF Power Amplifiers                                                                               |
| 2000 | Munich (Germany)               | High-Speed A/D Converters Mixed-Signal Design PLLs and Synthesizers                                                                                  |
| 1999 | Nice (France)                  | XDSL and Other Communication Systems RF-MOST Models and Behavioural Modelling Integrated Filters and Oscillators                                     |
| 1998 | Copenhagen (Denmark)           | 1-Volt Electronics Mixed-Mode Systems LNAs and RF Power Amps for Telecom                                                                             |
| 1997 | Como (Italy)                   | RF A/D Converters Sensor and Actuator Interfaces Low-Noise Oscillators, PLLs and Synthesizers                                                        |
| 1996 | Lausanne (Swiss)               | RF CMOS Circuit Design Bandpass Sigma Delta and Other Data Converters Translinear Circuits                                                           |
| 1995 | Villach (Austria)              | Low-Noise/Power/Voltage Mixed-Mode with CAD tools Voltage, Current and Time References                                                               |
| 1994 | Eindhoven (Netherlands)        | Low-Power Low-Voltage Integrated Filters Smart Power                                                                                                 |
| 1993 | Leuven (Belgium)               | Mixed-Mode A/D Design Sensor Interfaces Communication Circuits                                                                                       |
| 1992 | Scheveningen (The Netherlands) | OpAmps ADC Analog CAD                                                                                                                                |

# Contents

## Part I High-Speed Clock and Data Recovery

|                                                                                                                     |    |
|---------------------------------------------------------------------------------------------------------------------|----|
| <b>Fundamental Stochastic Jitter Processes Associated with Clock and Data Recovery: A Tutorial . . . . .</b>        | 3  |
| Anthony Fraser Sanders                                                                                              |    |
| <b>Clock Recovery and Equalization Techniques for Lossy Channels in Multi Gb/s Serial Links . . . . .</b>           | 17 |
| M. Pozzoni, S. Erba, P. Viola, M. Pisati, E. Depaoli, D. Sanzogni, R. Brama, D. Baldi, M. Repossi and F. Svelto     |    |
| <b>Top-Down Bottom-Up Design Methodology for Fast and Reliable Serdes Developments in nm Technologies . . . . .</b> | 35 |
| Jan Crols                                                                                                           |    |
| <b>Mixed-Signal Implementation Strategies for High Performance Clock and Data Recovery Circuits . . . . .</b>       | 47 |
| Michael H. Perrott                                                                                                  |    |
| <b>Jointly Optimize Equalizer and CDR for Multi-Gigabit/s SerDes . . . . .</b>                                      | 63 |
| Song Wu and Robert Payne                                                                                            |    |
| <b>Time to Digital Conversion: An Alternative View on Synchronization . . . . .</b>                                 | 77 |
| J. Daniels, W. Dehaene and M. Steyaert                                                                              |    |

## Part II High-Performance Amplifiers

|                                                                                                       |     |
|-------------------------------------------------------------------------------------------------------|-----|
| <b>Dynamic Offset Cancellation in Operational Amplifiers and Instrumentation Amplifiers . . . . .</b> | 99  |
| Johan H. Huijsing                                                                                     |     |
| <b>Current Sense Amplifiers with Extended Common Mode Voltage Range . . . . .</b>                     | 125 |
| W.J. Kindt                                                                                            |     |

|                                                                                                                             |     |
|-----------------------------------------------------------------------------------------------------------------------------|-----|
| <b>Low-Voltage Power-Efficient Amplifiers for Emerging Applications</b>                                                     | 147 |
| A. López-Martin, R.G. Carvajal, E. López-Morillo, L. Acosta,<br>T. Sánchez-Rodríguez, C. Rubia-Marcos and J. Ramírez-Angulo |     |
| <b>Integrated Amplifier Architectures for Efficient Coupling to the Nervous System</b>                                      | 167 |
| Timothy Denison, Gregory Molnar and Reid R. Harrison                                                                        |     |
| <b>Transimpedance Amplifiers for Extremely High Sensitivity Impedance Measurements on Nanodevices</b>                       | 193 |
| Giorgio Ferrari, Fabio Gozzini and Marco Sampietro                                                                          |     |
| <b>Design of High Power Class-D Audio Amplifiers</b>                                                                        | 209 |
| Marco Berkhouwt                                                                                                             |     |
| <b>Part III Power Management</b>                                                                                            |     |
| <b>Single-Inductor Multiple-Output Dc-Dc Converters</b>                                                                     | 233 |
| Massimiliano Belloni, Edoardo Bonizzoni and Franco Maloberti                                                                |     |
| <b>Enhanced Ripple Regulators</b>                                                                                           | 255 |
| Richard Redl                                                                                                                |     |
| <b>Robust DCDC Converter for Automotive Applications</b>                                                                    | 269 |
| Ivan Koudar                                                                                                                 |     |
| <b>Highly Integrated Power Management Integrated Circuits in Advanced Cmos Process Technologies</b>                         | 303 |
| Mario Manninger                                                                                                             |     |
| <b>Wideband Efficient Amplifiers for On-Chip Adaptive Power Management Applications</b>                                     | 317 |
| Lázaro Marco, Vahid Yousefzadeh, Albert García-Tormo, Alberto Poveda,<br>Dragan Maksimović and Eduard Alarcón               |     |
| <b>Design Methodology and Circuit Techniques for Any-Load Stable LDOs with Instant Load Regulation and Low Noise</b>        | 339 |
| Vadim Ivanov                                                                                                                |     |

# Contributors

**L. Acosta** Escuela Superior de Ingenieros, Universidad de Sevilla, Camino de los Descubrimientos s/n, 41092 Sevilla, Spain

**Eduard Alarcón** Technical University of Catalunya, Barcelona, Spain

**D. Baldi** STMicroelectronics, Pavia, Italy

**Massimiliano Belloni** Department of Electronics, University of Pavia, Via Ferrata, 1-27100 Pavia, Italy, massimiliano.belloni@unipv.it

**Marco Berkhouit** NXP Semiconductors, Nijmegen, The Netherlands, Marco.Berkhouit@nxp.com

**Edoardo Bonizzoni** Department of Electronics, University of Pavia, Via Ferrata, 1 - 27100 Pavia, Italy, edoardo.bonizzoni@unipv.it

**R. Brama** Università di Modena e Reggio Emilia, Reggio Emilia, Italy

**R. G. Carvajal** Escuela Superior de Ingenieros, Universidad de Sevilla, Camino de los Descubrimientos s/n, 41092 Sevilla, Spain, carvajal@zipi.vs.es

**Jan Crols** AnSem, Heverlee, Belgium, Jan.Crols@ansem.com

**J. Daniels** ESAT-MICAS, K.U. Leuven, Heverlee, Belgium

**W. Dehaene** ESAT-MICAS, K.U. Leuven, Heverlee, Belgium

**Timothy Denison** Medtronic Neuromodulation Technology, Minneapolis, MN 55410, USA

**E. Depaoli** STMicroelectronics, Pavia, Italy

**S. Erba** STMicroelectronics, Pavia, Italy

**Giorgio Ferrari** Politecnico di Milano, Dipartimento di Elettronica e Informazione P.zza L.da Vinci 32, 20133 Milano, Italy

**Albert García-Tormo** Technical University of Catalunya, Barcelona, Spain

**Fabio Gozzini** Politecnico di Milano, Dipartimento di Elettronica e Informazione, P.zza L.da Vinci 32, 20133 Milano, Italy

**Reid R. Harrison** Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT, 84112, USA

**Johan H. Huijsing** Delft University of Technology, Delft, The Netherlands, j.h.huijsing@tudelft.nl

**Vadim Ivanov** Texas Instruments, Inc., Tucson, AZ 85706. USA, ivanov\_vadim@ti.com

**W.J. Kindt** Delft Design Center, National Semiconductor Corporation, Delft, Netherlands, Wilko.kindt@nsc.com

**Ivan Koudar** AMI Semiconductor, Czech Republic, Ivan.koudar@amis.com

**A. López-Martin** Department of Electrical & Electronic Engeneering, Public University of Navarra, 31620 Pamplona, Spain

**E. López-Morillo** Escuela Superior de Ingenieros, Universidad de Sevilla, Camino de los Descubrimientos s/n, 41092 Sevilla, Spain

**Dragan Maksimović** CoPEC Center, ECE Department, University of Colorado, Boulder, CO 80309-0425, USA, maksimov@colorado.edu

**Franco Maloberti** Department of Electronics, University of Pavia, Via Ferrata, 1 - 27100 Pavia, Italy, franco.maloberti@unipv.it

**Mario Manninger** austriamicrosystems AG, Unterpremstätten, Austria, ealarcon@eel.upc.edu

**Lázaro Marco** Technical University of Catalunya, Barcelona, Spain

**Gregory Molnar** Medtronic Neuromodulation Technology, Minneapolis, MN 55410, USA

**Robert Payne** Texas Instruments, Inc., Dallas TX, USA

**Michael H. Perrott** Massachusetts Institute of Technology, Cambridge, MA, USA, mhperrott@gmail.com

**M. Pisati** STMicroelectronics, Pavia, Italy

**Alberto Poveda** Technical University of Catalunya, Barcelona, Spain

**M. Pozzoni** STMicroelectronics, Pavia, Italy, massimo.pozzoni@st.com

**J. Ramírez-Angulo** Klipsch School of Electrical and Computer Engineering, New Mexico State University, Dept.3-0 Las Cruces, NM 88003-0001, USA

**Richard Redl** ELFI S.A., Montévaux 14, CH-1726 Farvagny, Switzerland,  
rredl@freesurf.ch

**M. Repossi** STMicroelectronics, Pavia, Italy

**C. Rubia-Marcos** Escuela Superior de Ingenieros, Universidad de Sevilla, Camino de los Descubrimientos s/n, 41092 Sevilla, Spain

**Marco Sampietro** Politecnico di Milano, Dipartimento di Elettronica e Informazione P.zza L.da Vinci 32, 20133 Milano, Italy, sampietr@elet.polimi.it

**T. Sánchez-Rodríguez** Escuela Superior de Ingenieros, Universidad de Sevilla, Camino de los Descubrimientos s/n, 41092 Sevilla, Spain

**Anthony Fraser Sanders** Senior Principal, Infineon Technologies

**D. Sanzogni** STMicroelectronics, Pavia, Italy

**M. Steyaert** ESAT-MICAS, K.U. Leuven, Heverlee, Belgium

**F. Svelto** Università degli Studi di Pavia, Pavia, Italy

**P. Viola** STMicroelectronics, Pavia, Italy

**Song Wu** Xilinx, Inc., Dallas TX, USA, song.wu@xilinx.com

**Vahid Yousefzadeh** CoPEC Center, ECE Department, University of Colorado, Boulder, CO, 80309-0425, USA

# Part I

## High-Speed Clock and Data Recovery

The first chapter of this book is on high-speed clock and data recovery circuits (CDR). In modern high speed communication systems, the recovery of the clock is becoming a key in order to recover accurate the data. Most of the architectures used are PLL based topologies, but alternative ways such as time-to-digital converters are becoming in the picture as well. The major requirement for high speed CDR is the jitter requirement since this will directly effect the bit-error-rate (BER). For that first the basics mechanisms will be addressed, followed by PLL circuits and finally by special alternative topologies.

The first paper, of Anthony Sanders, deals with the fundamental stochastic issues in the jitter process. The different sources and definitions are discussed. Also the effect of channels and the ISI (intersymbol interference) have an effect on the jitter performances. As a result more and more there is a need for a reliable stochastic prediction of the different sources towards the jitter of CDR systems.

The second paper, of Massimo Pozzoni, addresses CDR circuits for lossy channels. Since serial communication speeds reaches the 10 Gb/s, also the channel attenuation, and as such the required channel equalization becomes important. The band limitation will result in ISI and as a result analog boost (to partially equalize the channel) and decision feedback equalizations (DFE) techniques can be used. The work presents some examples and design with possibilities to achieve 10 Gb/s serial communication systems.

The third paper, of Jan Crols, addresses the design methodology. Those complex PLL CDR circuits require very defined design flows to cope with the ever faster track from differentiating IP to commodity IP block in the CMOS serial interface communication circuits. Following the design flow, examples of 10 Gb/s links in 130 nm and PCI-Express links in 90 nm CMOS are demonstrated.

The fourth paper, of Michael Perrott, describes the design approach of high speed high performance CDR by using a good mix between analog and digital building blocks: due to the trends of more digital, a clear trade off between the different building blocks are required. It is shown that by a good selection of a combined implementation of both analog and digital circuits better performances can be achieved. This is both the case in the loop filter, phase detector and VCO. The result is 2.5 GB/s CDR circuits with jitter performances of better than 1.4 ps rms.

The fifth paper, of Song Wu, analysis the possibilities of jointly optimized equalizer and CDR to reduce the multi-channel ISI. Not only the equalization and the CDR but also the duty cycle distortion are jointly optimized. Also for high speed decision feed back equalization (DFE) is required, resulting in stringent requirements on the timing of the DFE to allow a correct update.

The last paper, of J.Daniels, discusses a different approach for CDR. It is based on time-to-digital converters instead of the classical approach of PLL's. The possible advantage is that those topologies can also perform CDR for burst mode applications. A design example and performances for 1 Gb/s (up to 500 MHz) in 130 nm technology are discussed.

# Fundamental Stochastic Jitter Processes Associated with Clock and Data Recovery: A Tutorial

Anthony Fraser Sanders

**Abstract** This paper provides an introductory tutorial of time jitter, its definitions, sources in high speed interconnect systems, and how it is transformed as it propagates from the source to the termination. Mathematical details are refrained upon to allow easier reading and entry into this abstract subject, however extensive references are given for further study of underlying mathematical theories.

## 1 Introduction

Time jitter is a stochastic process, and is by definition not exactly predictable. Every measurement, every definition has an associated probability of occurrence or level of confidence. This paper initially introduces the state of the art methodology to define jitter and gives the basis for all further discussion on the matter. Following this, sources of jitter in typical high speed interfaces architectures are highlighted, and how this jitter is transformed as it propagates from the source to the termination at the receiver sampler. Finally an overview of laboratory measurement capability is included to help bring the rather conceptual subject back to reality.

## Definitions

|     |                              |
|-----|------------------------------|
| CDR | Clock and Data Recovery      |
| PDF | Probability Density Function |
| CDF | Cumulative Density Function  |
| DDJ | Data Dependant Jitter        |
| DJ  | Deterministic Jitter         |
| RJ  | Random Jitter                |
| PJ  | Periodic Jitter              |
| SSC | Spread Spectrum Clock        |

---

A.F. Sanders (✉)  
Senior Principal, Infineon Technologies

|         |                               |
|---------|-------------------------------|
| PLL     | Phased Locked Loop            |
| VCO     | Voltage Controlled Oscillator |
| DLL     | Delay Locked Loop             |
| BERT    | Bit Error Rate Tester         |
| RTScope | Real Time Scope               |
| EQScope | Equivalent Time Scope         |
| PC      | Personal Computer             |
| RMS     | Root Mean Squared             |

## 2 Defining Jitter

Jitter is defined as a continuous signal in the time domain, that represents the phase deviation from an ideal integrating phase. The integrating phase is in effect a clock and acts as the reference plane. This jitter can be translated and considered in the frequency domain, with no loss of information and can therefore be treated as a linear time invariant signal.

Due to the nature of clocks, the jitter can only be observed at the transition of the clock. Extending this idea to a jittered data stream, the jitter can only be observed when the data transitions occur, Fig. 1. For generality, this observation point in time shall be referred to as a jitter event, Eqs. 5.1–5.2 [1]. When a jittered data is displayed with respect to the reference plane, but with a periodic time axis of modulus equal to the clock period, we obtained a so-called accumulated eye, Fig. 2.



Fig. 1 Fundamental Jitter



**Fig. 2** Accumulated Eye

When specifically considering clocks, jitter can be related to spectral phase noise density with respect to the carrier. The phase noise spectral density of a clock can be translated into the time domain, by considering a region of integration. After integration the resulting power can be converted to radians and related to time given the period of the clock.

The reference plane of the jitter can be redefined as a time function of the jitter, e.g. a CDR tracking function, Fig. 3, or extending this idea the jitter can be defined with respect to itself, so-called n-cycle jitter, Fig. 4.

The most important representation of jitter is in the statistical domain Eqs. 5.3–5.5 [1]. A PDF of the jitter can be created, without loss of information, by representing the jitter events as a set Diracs of equal amplitude. As jitter is classically used in the prediction of an error event, occurring when the jitter exceeds a given value, the PDF is integrated to give the CDF. The edge jitter effects both the data bit to the left or right of the edge, and therefore the CDF should be integrated from minus and positive infinity to the point of interest, Fig. 5 (top).

System level budgeting of jitter has since the late 90's been broken down into two components, RJ and DJ. Both these terms are a misnomer, but have adopted their own meanings which can be best defined mathematically, Eqs. 5.12–5.14 [1]. Qualitatively the RJ and DJ of the jitter represent the underlying sigma and offset from zero of two Gaussian distributions. Extending this idea, the CDF distribution of the jitter can be normalised to Q, or units of sigma, Fig. 5 (bottom). Once normalised a Gaussian distribution appears as a straight line, with gradient equal to the inverse of the sigma.



Fig. 3 CDR Reference Plane



Fig. 4 n-edge Reference Plane



**Fig. 5** CDF to Q domain

When jitter is time correlated, and can be represented in the frequency domain as a spectral spur, the jitter is termed PJ. As equalisation schemes became widely used, if the jitter can be correlated to the data it is termed DDJ.

In a bid to clarify the different terms, the OIF decided to adopt a set of terms defining the distribution of the jitter and its correlation, Table 1.3 [2]. These definitions could be broken down into three distinct antonyms, (a) Unbounded or Bounded, (b) Correlated or Uncorrelated, (c) Gaussian or High Probability. In this context RJ is Gaussian, and DJ, high probability. The difference between bounded and unbounded is with reference to whether a Gaussian distribution extends to infinity, or peaks at a given probability. These terms were shortened, which led to rather complicated acronyms and is usually only used by experts in detailed discussions. This text shall use the distinct terms, but shall avoid the use of acronyms.

Jitter budgeting is a means to estimate all the various sources in the system, predict how they propagate through the system and to combine their influence at the terminating sampling register. Gaussian jitter can be defined as a sigma value or peak value for a given probability or BER, whereas High Probability Jitter is always a peak to peak value. When performing jitter budget calculations the High Probability Jitter terms are summed linearly, whereas the Gaussian terms are RMSed and then multiplied by twice the equivalent number of sigma for the given BER. Clearly the total jitter, must then be less than 1UI for the target BER, if the system is going to perform with margin, Section 2.C.4.7 [2].

### 3 Sources of Jitter

#### 3.1 Clock Sources

Considering the clock source as a black box in the system, there are a number of basic classes to be considered. The specification for clock sources is either defined with a power spectral density for given frequency offsets, or integrated time noise for a given frequency band. For system jitter budgeting, the power spectral density will be propagated through the system as far as possible to allow linear frequency dependencies to be taken into account, before being considered in the time domain or statistical domain.

In the lab, reference generators are used for compliance testing and are of high fidelity. The output jitter specification of this equipment is such that its influence of the measurement of the DUT is insignificant.

In target application systems, cost is of paramount importance and this is directly related to phase noise performance. An important distinction should be noted, a clock reference's frequency stability and output jitter are not necessarily related. A stratum clock source may have long term frequency stability, but the output jitter could be unsuitable for a 100 km optical interconnect.

For high end optical communications systems, VCXOs with reasonable frequency stability of 20–30 ppm, and phase noise of less than 0.5 ps between 12 kHz and 20 MHz are utilised. This source of jitter can be considered purely as unbounded Gaussian with a well defined spectrum. If the spectrum is not specifically known then a Leeson approximation can be used, Eq. 13 [3].

For cost sensitive systems, e.g. PCIe, clock buffers are used which provide multiple clock frequencies in the system, usually with a spread spectrum modulation, a.k.a SSC. These clock sources have been measured to have excessive wideband jitter relative to the underlying phase of the SSC, and are normally unsuitable for high speed interfaces in excess of 2.5 Gbps. This stems from the harsh power supply conditions and varied clock output frequencies of the device. Although a large portion of the jitter is due to multiple spectral spurs, the jitter is typically considered unbounded Gaussian, however the exact spectral content must be considered when analysing system performance.

Components can generate their own reference in combination with a quartz crystal. Although the crystal itself can have a phase noise of  $-120 \text{ dBc/Hz}$  @ 100 Hz offset and can again be considered as unbounded Gaussian with a Leeson's spectrum, active circuitry is required in order to fulfil the conditions for oscillation, and this circuitry is a dominant source of jitter.

#### 3.2 Active Circuitry

The phase noise performance of any oscillator must use balanced harmonic, or shooting methods [4] to account correctly for folding and integration of the flicker and thermal noise. The simulation must include all noise significant devices including

all biases and reference generators, which can even for ringo oscillators be the dominant source of jitter.

The small signal transfer function from the supply to the output should be estimated using Balanced Harmonic, however, the power supply noise is a definition problem, as the exact spectral content can neither be pre-defined nor measured. It is recommended that an amplitude vs. frequency mask be defined, and the jitter for each frequency point calculated, and propagated through the system. Although power supply noise is random in nature, it is not possible to predict exact phase relationships and correlations, therefore the jitter should be considered as bounded high probability.

Non oscillatory circuitry can be simulated using standard time domain approaches. Single ended CMOS buffers convert supply noise to time jitter through the variation of the decision sampling. Depending upon the edge of the signal and the supply noise, the point at which the decision is made varies. The measurement of output jitter must be performed as the “real” receiving circuitry would. Assuming a CMOS buffer drives into another CMOS buffer, both using the same noisy supply, then the output jitter of the first buffer must be measured with reference to half the noisy supply voltage. This demonstrates why CMOS circuitry can be employed for high speed design, but only where a solid single supply is used.

Differential CML buffers also have a finite PSRR and CMRR, and their performance must be estimated in the presence of mismatch to be properly observed. Although Monte Carlo offers the ability to vary device matching, in combination with supply noise stimulus, the simulation set becomes intractable. It is recommended that a short Monte Carlo simulation be used to identify a small subset of worse case technology subspaces, and these subsets be then used for longer more rigorous supply simulations.

For both buffer types, simple mismatch of their response to rise and falling edge, single or differentially defined, leads to a DCD which must be treated as High Probability Jitter, with a frequency component at half the toggle rate.

Although the output data of a sampling register is effectively a buffered version of the sampling clock and can be analysed as such, sampling registers are usually the termination point in a jitter path, i.e. a jittered data is sampled by a jittered clock and a decision is made. Due to the non-ideal sampling window of the register, there exists a violation window within which the data will not be correctly determined. The violation window is defined as the point where the clock to data output delay exceeds a predefined value, and must be analysed in the presence of supply noise. The inherent jitter contribution of the sampler should be considered a bounded high probability jitter, although it can be shown that this jitter is also correlated to the data being received.

### **3.3 PLL**

As the clock reference is typically not of the required frequency, PLLs must be utilized to multiply the reference to the target frequency. [5] provides the most

comprehensive text to PLLs and shall be used as the reference for all the phase transfer definitions.

At the heart of the PLL is a VCO, which is typically either inductor based or ring oscillator based. Clearly the Q factor of the inductor leads to state of the art designs significantly exceeding 100  $B_c$  @ 1 Hz offset, whereas leading edge ring based oscillators are usually between 80 dBc and 90 dBc. These phase noise figures have been successfully optimised using the ISF methodology developed in [3], and the phase noise contribution report generated by Balanced Harmonic simulation. This phase noise, sees a high pass transfer function, Eq. 15.8 [5] and can be propagated and treated as unbounded Gaussian in the system as for other clock sources.

Whether the PLL is digital based or analog based, the control of the VCO input is discretely modulated. This modulation leads to defined spurs at the output the PLL, and if significant should be considered as bounded high probability, Section 12.6, App.10A [5].

Other components in the PLL loop, e.g. feedback dividers, can be treated in a similar fashion to other active circuitry for the calculation of their inherent jitter, however, again the frequency of the spur must be propagated to the output of the PLL as for n-fractional PLLs, Section 15.3.3 [5].

### 3.4 Channels

Channels or electrical interconnect are passive multiport systems, and can be represented in the frequency domain using s-parameters, or in the time domain using impulse or step responses. When data propagates through a medium, where the



**Fig. 6** Channel Jitter

bandwidth is not sufficient to ensure linear and zero group delay, or non-ideal termination and signal discontinuities cause reflections, Intersymbol interference will occur. This can be best understood when considering the pulse response of a system. Additionally crosstalk on either data or clock signals leads to amplitude noise which can then also due to reflections and discontinuities be additionally exacerbated. This again can be considered by observing the pulse response.

Amplitude to time conversion occurs at the transition of the signal, and causes the ISI to be converted into time jitter. However, this translation should be estimated through observing the statistical contours of the ISI, as the signal sample point is arbitrarily moved, Fig. 6.

Time jitter from the channel, when generated by high order polynomial data shows a bounded correlated Gaussian distribution. Crosstalk will appear as a bounded uncorrelated Gaussian distribution, but in both cases the spectrum of the jitter can be directly translated from the spectrum of the data.

## 4 Jitter Transfer and Termination

### 4.1 PLLs & CDRs

The transfer of jitter through a system is a complicated process of frequency linear and time non-linear functions. As long as the translation is linear, then the time jitter is considered in the frequency domain, but as soon as non-linearity becomes apparent, time domain representation of the jitter frequency content and distribution must be used. When the transformation of the time jitter must be performed in the time domain, it is impossible to simulate with the necessary confidence level down to the target BER of the system. In this case the jitter distribution is extrapolated by normalising the CDF to Q, and extending the linearly represented Gaussian distribution.

A PLL, as partly described already, performs an n-order low pass filtering of a jittered clock presented at its input, and can be treated as a linear system, Eq. 15.7 [5]. However, in the presence of large phase perturbations, e.g. SSC, the linearity can breakdown, and it is recommended that a time discrete model of the PLL be used to capture non-linear and sampling phenomena in the phase detector [6]. DLLs are classically thought of to provide no frequency domain jitter filtering, however, due to the feedback a jitter peaking can be observed [7].

CDR's architectures can be treated as pure linear system, however, when CDR are implemented using binary phase detectors, digital loop filters and digital controlled phase interpolators, the behaviour becomes highly non-linear. The non-linearity stems from the phase detector variation as a function of the untracked input jitter distribution and although many theories exist for the translation of a PDF through non-linear functions Eq. 10.70 [8] and for random sampling Eq. 11.146–11.153 [8], the extension to an analytical closed loop solution for a CDR has not been completed. To the most part the jitter of interest is not in band, and therefore simplified theories can be used to linearise the phase detector for calculating the maximum slew rate performance [9].

As the CDR is used to recover a clock from the data, which in turn is used to sample the data, the error tracking function of the CDR is of interest, in contrast to the forward tracking function. The CDR therefore shows a high pass filter function, but cannot be treated easily in the frequency domain, as in contrast to classical approaches to solving delta sigma control functions, the CDR has no frontend anti-aliasing filter to remove images in the folded spectrum.

## ***4.2 Clock Architectures***

The clock architecture of the system under investigation defines reference plane of the jitter and therefore how it is transformed, Fig. 7.

In a classic distributed clock system, where each transceiver has its own reference clock source, the jitter at each point in the system is measured with respect to either a fixed frequency reference or a recovered clock from a reference CDR tracking function. In the latter case, the jitter is being high pass filtered and eliminates any low frequency noise components and frequency offsets, [2].

In systems where the interconnect is contained within the box, e.g. PC, the reference clock for both the transmitting and receiving ends is shared. Considering the jitter transfer from the reference clock to the terminating receiver sampler, must be performed by observing two paths. Firstly via the reference clock trace, the transmitter PLL, the transmitter circuitry, interconnect channel and finally sampler data input, and secondly, via the reference clock trace, the receiver PLL and the sampler clock input. This architecture eliminates the need for the CDR to track large amounts of low frequency jitter, e.g. SSC due to the correlation of the two paths. However, this requires for exact delay matching for the two paths, including finely controlled transfer characteristic of both the transmit and receive PLL, [1, 11]

An extension to the central distributed clock system, is the clock forwarded architecture. Parallel to the data, a clock is transmitted, usually with either the same or half the frequency of the data. This clock, due to its jitter correlation with the data is used to convey time jitter of the transmitter to the receiver in order to improve its jitter tolerance. This architecture should not be confused with source synchronous systems when only static alignment between the clock and data occurs. The jitter on the data at any point in the system is measured with respect to the time linear filtered clock. The time linear filter is representing possible skew between the two signals and a 2nd order PLL.

## ***4.3 Channels & Data Pattern***

Channels are not only a source of jitter, but also modify certain types of jitter. N-cycle jitter causes so-called jitter amplification, or better said pulse distortion, when data or clock is transmitted through a bandlimited channel.



Fig. 7 Clock Systems

The data stream can cause significant modification of the jitter measured for systems when the reference plane is defined by a CDR. For example, 8 b–10 b data with no pre-scrambling can contain low frequency jitter within the CDR tracking bandwidth. These low frequencies lead to so-called killer patterns effects. Given a low transition density data pattern, the mean jitter will tend to be late in time. If this pattern continues for a time comparable with the time constant of the CDR, the

CDR will lock to the mean of this jitter. If the data now suddenly changes to a high transition density, the average jitter is now early in time. Clearly the total jitter seen is higher than if the data has had a reasonable mixture of low and high transitioning, and the CDR had locked to the average of high and low transition density patterns.

A further extension of this idea, is simple down-folding. As described above, jitter is actually a time continuous phenomenon that is only observable at transition in the clock or data. This is equivalent to sampling of the jitter, and like-wise causes folding of the original jitter signal. If significant jitter energy is folded into the tracking bandwidth of the CDR, then this can cause incorrect tracking of the CDR and movement of the reference plane.

## 5 Measurement

### 5.1 Equipment

It is imperative, like any other physical phenomena, to define measurement methodologies to enable a true definition. Measurement of jitter can be mainly performed using three technologies.

RTScope are easy to understand in their working, as they are merely oversampling the data or clock signal, with typically a ratio of four to ten. The depth of the sample can be such as to capture up to 100,000 possible transitions. A typical analogue front end of 12 GHz bandwidth with 8 bit resolution samples can then be post-processed to determine the crossing levels, and CDR tracking functions performed to extract the reference plane.

BERT operate from a provided reference clock and verify whether the received data is error free. The sample point or absolute time offset of the data sampling with respect to the reference clock can be varied and allows the CDF of the jitter to be accumulated very fast. As the reference clock or reference plane must be provided from external to the BERT, this requires the use of a hardware CDR, or fixed frequency reference clock. As the receiver of a BERT is equivalent to the normal receiver its bandwidth must be well in excess of the signal and internal offsets of the input decision maker minimised.

EQScope are under-sampling, high bandwidth, high resolution acquisition systems. Through accumulated sampling of the signal, a data eye can be collected, from which the distribution of the transition edge can be measured. Like the BERT, an external trigger or reference plane is needed, and must include if necessary a hardware CDR.

### 5.2 Confidence Level

The measurement of a BER, involves the collection of data from a stochastic process. In confirming the ratio of error bits to received bits, the observed ratio displays

a variation in accordance with a Bernoulli process, Section 2.E.2 [2], such that the observed BER has a sigma variation proportional to the square root of the number of bits measured. Typically to avoid errors in the measurement of a BER, at least 100 errors must be accumulated before the ratio is calculated.

### 5.3 Statistical Tools

To accurately predict the statistics of a signal from a jittered transmitter through a bandlimited channel and receiver equalisation, Statistical Signal Analysis tools have been developed, such as Stateye [10]. These tools analytically convolve the various time jitter sources together with the extracted ISI contribution of the channel, to give an amplitude PDF of the received signals. From this PDF, the CDF of the zero crossing can be extracted, and the received jitter predicted.

## 6 Conclusions

The prediction and measurement of time jitter in interconnect systems is an evolving field, as measurement demonstrates new theories and methodologies. As the speed of interconnect increases and the requirements of cheap and low power components continue, the need for reliable stochastic prediction of all sources in the system become necessary.

Anthony Sanders was editor of Section 5 of [1], which is a complete mathematical description of jitter and its breakdown. He was also co-author of [2], and included a concise treatment of jitter in the context of high speed signal compliance testing.

## References

1. JEDEC “FBDIMM Specification: High Speed Differential PTP Link at 1.5 V”, JESD8-18, SEPTEMBER 2006. <http://www.jedec.org/download/search/JESD8-18.pdf>
2. Optical Internetworking Forum “Common Electrical I/O (CEI) – Electrical and Jitter Interoperability agreements for 6G+ bps and 11G+ bps I/O”, 28th February 2005. [http://www.oiforum.com/public/documents/OIF\\_CEI\\_02.0.pdf](http://www.oiforum.com/public/documents/OIF_CEI_02.0.pdf)
3. T. H. Lee and A. Hajimiri, “Oscillator phase noise: A tutorial,” IEEE Journal of Solid-State Circuits, vol. 35, pp. 326–336, March 2000.
4. K. S. Kundert, “Introduction to RF simulation and its application,” IEEE Journal of Solid-State Circuits, vol. 34, pp. 1298–1319, September 1999.
5. Floyd M. Gardner, “Phaselock Techniques”, 3rd Edition, Wiley.
6. B. De Muer and M. S. J. Steyaert, “A CMOS monolithic? S-controlled fractional-N frequency synthesizer for DCS-1800,” IEEE Journal of Solid-State Circuits, vol. 37, pp. 835–844, July 2002.
7. M. E. Lee, W. J. Dally, T. Greer, H. Ng, R. Farjad-Rad, J. Poulton, and R. Senthinathan, “Jitter transfer characteristics of delay-locked loops? Theories and design techniques,” IEEE Journal of Solid-State Circuits, vol. 38, pp. 614–621, April 2003.

8. Athanasios Papoulis, S. Unnikrishna Pillai, "Probability, Random Variables and Stochastic Processes", McGraw-Hill; 4 edition December 14 2001.
9. J. Lee, K. S. Kundert, and B. Razavi, "Analysis and modeling of bang-bang clock and data recovery circuits," IEEE Journal of Solid-State Circuits, vol. 39, pp. 1571–1580, September 2004.
10. Anthony Sanders, "Channel Compliance Testing Utilizing Novel Statistical Eye Methodology", Euro DesignCon 2004.
11. Prete, E.; Scheideler, D.; Sanders, A., "A 100 mW 9.6 Gb/s Transceiver in 90 nm CMOS for Next-Generation Memory Interfaces," Solid-State Circuits Conference, 2006. ISSCC 2006. Digest of Technical Papers. IEEE International, pp. 253–262, Feb. 6–9, 2006.

# Clock Recovery and Equalization Techniques for Lossy Channels in Multi Gb/s Serial Links

M. Pozzoni, S. Erba, P. Viola, M. Pisati, E. Depaoli, D. Sanzogni, R. Brama,  
D. Baldi, M. Repossi and F. Svelto

**Abstract** A fully integrated 8.5 Gb/s multi-standard DFE receiver for SATA, SAS and FC is presented. This work addresses the impact that data storage communication standards have on data equalization and clock recovery. The data storage environment and the implication on receiver architecture are described. Implementation of CMOS high speed circuits is discussed and experiments of realized prototypes are presented. The main design parameters of early-late digital clock recoveries are analyzed, and their relationship to system requirements is investigated. At last, additional architectures for higher communication speeds are introduced, together with their potential application in the data storage environment.

## 1 Introduction

Serial interfaces have progressively replaced older parallel interfaces, in the recent past. In the hard disk drives field, consumer market has moved from Parallel Advanced Technology Attachment (P-ATA) to Serial ATA (SATA), while enterprise market from Small Computer System Interface (SCSI) to Fiber Channel (FC) and, more recently, to Serial Attached SCSI (SAS). Meanwhile, the increasing demand for computing power is pushing the required data rates towards higher speeds, up to 6 Gb/s for SATA/SAS and to 8.5 Gb/s for FC.

An extraordinary effort in the area of equalization techniques is underway and several solutions have been proposed [1–4]. At the same time, industrial requirements set specific challenges, limiting equalization and clock recovery solutions well suited for the application.

The goal of this work is to clarify the requirements set by the data storage serial communication environment and to investigate the impact on equalization and clock recovery for high speeds and for communication beyond 10 Gb/s. A multi-standard architecture able to address backplane communications up to 8.5 Gb/s is proposed and measured results are reported.

---

M. Pozzoni (✉)  
STMicroelectronics, Pavia, Italy  
e-mail: massimo.pozzoni@st.com

This paper is organized as follows: the next section introduces the data storage environment and the main equalization techniques; Section 3 presents the architecture of the multi-standard 8.5 Gb/s receiver; section 4 introduces further equalization techniques for communication beyond 10 Gb/s and Section 5 gives the conclusions.

## 2 The Data Storage Environment and Equalization

The data storage environment that is analyzed in this work is shown in Fig. 1.

A transmitter and a receiver are communicating through a dispersive channel that can be a cable or a backplane.

The signal integrity at receiver side is impaired by the physical channel in several ways:

- Intersymbol interference (ISI), due to channel bandwidth limits;
- Reflections, caused by limited RF impedance matching, mainly due to connectors;
- Crosstalk, caused by the interference of adjacent channels in backplane communication or between transmitter and receiver.

Among these, the major source of signal corruption is definitely ISI.

Taking the high-loss compliance channel in 8.5 Gb/s FC [5] as an example, Fig. 2 shows the response to a rectangular unitary pulse, with 1 bit interval (1 UI) length.

The frequency dependent loss produces two main effects: a lower peak value  $V_{pku}$  and a pulse tail.

If we assume to sample the received data at the pulse peak ('cursor'), the UI-spaced samples ('postursors') of the pulse tail of the previously transmitted bits are the source of ISI, together with the samples preceding the cursor ('precursors'), as shown in Fig. 3a. In the assumption of negligible precursors, Decision Feedback



**Fig. 1** Data storage environment



**Fig. 2** Channel loss effect on pulse response

Equalization (DFE) [6] can be used to remove the ISI effect caused by postursors, based on the knowledge of previously transmitted bits. As shown in Fig. 3b, DFE multiplies the received bits by the estimated values of the channel postursors ( $C_i$ ), thus reconstructing the ISI and subtracting it from the incoming signal. By sizing the number of corrective taps according to the number of postursors to be removed, DFE allows to recover the received pulse peak. On the other hand, it does nothing to restore the original peak amplitude.

Considering that the sum of all the UI-spaced cursors must equal the received dc level ( $V_{DC}$ ), the following equation holds:

$$\sum_{i \neq 0} C_i = V_{DC} - C_0 = V_{DC} - V_{DC} \cdot V_{pk} = V_{DC} - V_{pk} = V_{DC} \cdot \left(1 - \frac{V_{pk}}{V_{DC}}\right) \quad (1)$$

where  $C_i$  is the amplitude of the  $i^{\text{th}}$  pulse at the measurement instant,  $V_{pk}$  represents the peak of the channel response to a pulse whose amplitude is  $V_{DC}$ .



**Fig. 3** ISI and DFE

The vertical eye opening ( $V_{eye}$ , defined as the difference between the cursor peak amplitude and the worst case precursors and postursors combination at the measurement instant) can be calculated from (1):

$$\begin{aligned}
 V_{eye} &= C_0 - \sum_{i \neq 0} |C_i| = C_0 - \sum_{i \neq 0} C_i - 2 \cdot \sum_{i \neq 0, C_i < 0} |C_i| \\
 &= V_{pk} - V_{DC} \cdot \left( 1 - \frac{V_{pk}}{V_{DC}} \right) - 2 \cdot \sum_{i \neq 0, C_i < 0} |C_i| \\
 &= V_{DC} \cdot \left( 2 \cdot \frac{V_{pk}}{V_{DC}} - 1 \right) - 2 \cdot \sum_{i \neq 0, C_i < 0} |C_i|
 \end{aligned} \tag{2}$$

In case the pulse response is unipolar, no negative cursors exist and the vertical eye opening is proportional to  $V_{DC}$  times a factor that depends on the peak of the unitary output pulse, only.

To increase the vertical opening of the eye, common techniques make use of analog boost equalizers [7] to recover channel loss. In particular, this must happen not only at the Nyquist frequency, where the maximum attenuation occurs (4.25 GHz in 8.5 Gb/s FC as in the examples of this section), but also at lower frequencies.

As an example, Fig. 4 shows a common implementation of a boost equalizer as a cascade of CML zero-pole high pass stages. In this case, the equalizer has been designed to match the channel reverse function at low frequencies.

If the Nyquist boost is increased, data patterns alternating opposite bits (clock patterns) will be better equalized, but in case the channel reverse function is not matched at lower frequencies, the overall vertical eye opening can not be improved. As shown in the example of Fig. 5, increasing the boost at Nyquist improves the vertical eye opening, until a maximum level is reached. This saturation is due to the growing of negative cursors in equation (2), caused by channel mismatch at lower frequencies.

Besides ISI, crosstalk from adjacent channels can severely impair the transmission performances. To analyze the impact of the boost equalizer in case a crosstalk source is present at the equalizer input, the boost equalizer can be modeled as an



**Fig. 4** Analog boost equalizer



**Fig. 5** Effect of channel mismatch

ideal equalizer with 0 dB gain at the Nyquist frequency, followed by ideal gain stages, as shown in Fig. 6.

The overall signal to crosstalk ratio is not affected by this partition and it can be completely represented by the effect of the 0 dB gain equalizer without loss of generality. As shown in the following, this equalizer impacts the signal and the crosstalk in different ways.

Considering a channel ideally equalized up to the Nyquist frequency, as in Fig. 7a (dashed line), a clock pattern, alternating opposite bits, will have the same amplitude before and after equalization, since it has no harmonic content before Nyquist.

The amplitude of the clock pattern before equalization can be calculated from the channel pulse response (Fig. 7b) as:

$$V_{CLK} = C_0 + \sum_{i \neq 0} (-1)^{|i|} \cdot C_i \quad (3)$$

The monotonic behavior at the left and right sides of the cursor  $C_0$  implies  $C_0 > V_{clk}$ .

On the other hand, after equalization the ISI is negligible and the peak of the pulse response equals the amplitude of the clock, which is not modified by the equalizer. This leads to the conclusion the equalizer causes a reduction of the pulse peak response.

This is confirmed in Fig. 8, showing the pulse peak response of the channel of Fig. 2 varying the equalizer boost at Nyquist frequency (x-axis), while matching the low frequency channel reverse function.

When the input signal is crosstalk, the channel response is completely different.



**Fig. 6** Equalizer modeling:  
0 dB boost and gain



**Fig. 7** Effect of the 0 dB boost equalizer on pulse peak

**Fig. 8** Pulse peak reduction by a 0 dB boost equalizer



The crosstalk originates from adjacent transmitting channels, thus starting from the same spectral content of the signal, but because coupled crosstalk is high-pass shaped, low-frequency components are suppressed. The result is that crosstalk energy, mainly present around Nyquist or beyond it, will be only slightly affected by an equalizer having 0 dB boost at Nyquist.

As a consequence, a high boost equalizer, as shown in Fig. 9a, leads to a high pulse peak reduction, together with possible crosstalk enhancement. A strong degradation of signal to crosstalk ratio results. On the contrary, a moderate boost, simply aimed at compensating the low frequency part of the channel loss (Fig. 9b), will cause only a moderate pulse peak reduction, partially compensated by small attenuation that can affect the crosstalk too.



**Fig. 9** Analog boost equalizers



**Fig. 10** Partitioning between analog boost and DFE

This analysis leads to the trade-off addressed in this work: a moderate boost analog equalizer to compensate for the low frequency part of the channel loss, followed by a limited number of DFE taps, to compensate for the high frequency part (Fig. 10).

Together with a reduction in DFE complexity, the analog boost equalizer will reduce the impact of precursor, will help convergence of DFE adaptation and will improve the overall performances in clock recovery, as shown in the following sections.

### 3 A Multi-Standard 8.5 Gb/s DFE Receiver for SATA, SAS and FC

The present section is dedicated to the receiver block, with emphasis to equalization and clock recovery. In particular the focus is on SATA, SAS and FC standards.

These standards have some common aspects, but also some specific differences:

- Multi-rate operation is common to all the standards, but at different rates: 1.5, 3, 6 Gb/s in SATA/SAS, 2.125, 4.25, 8.5 Gb/s in FC;
- Cable and backplane equalization is a common requirement, even if at different frequencies and channel losses;
- The maximum frequency difference between a FC transmitter and receiver is limited to  $\pm 200$  ppm, while in SATA and SAS, for EMI suppression, the transmitted data can be modulated in frequency by a 30 kHz triangular shape, with a maximum amplitude of 5000 ppm (Spread Spectrum Clock – SSC);
- FC must assure shorter locking time (2500 bits) and serial to parallel data latency.

#### 3.1 Design Methodology

The methodology followed in the receiver design entails several considerations: optimum partition of equalization between analog boost circuit and DFE, selection of the best suited clock recovery system and mixed signal verification.

Equalization partitioning takes into account ISI, crosstalk and reflections by means of S-parameters analysis. A random received eye is computed for different configurations of transmission channels and interference to assure a target bit error rate lower than 1e-12. The best partition between the analog block and DFE follows.

Verification of system performances, including both equalization and clock recovery, leads to the second step, consisting in VHDL modeling and simulation of the whole system. VHDL modeling allows using event-based VHDL simulators, minimizing CPU time. Finally, mixed signal simulation allows verification of the analog performances of the designed circuit, including parasitic effects. It is anyway limited to short ( $\sim \text{us}$ ) time frame analysis, because of the high computing power required.

### 3.2 Architecture

The block diagram of the receiver, implemented in 65 nm CMOS, is shown in Fig. 11.

A programmable gain amplifier, preserving linearity of the input chain at different transmitted levels, is followed by an analog boost equalizer and by three sampling and demultiplexing paths. The central data path (C) drives a three taps DFE whose output is shared with an edge path (E), sampling on data transitions, and an auxiliary path (A) with programmable threshold samplers, for DFE adaptation.

Three phase rotators, driven by the CDR, generate the three clocks from one I/Q reference.

The DFE reconstructs a full-rate data eye from half-rate clocking by means of multiplexers (muxes) in the feedback path (Fig. 12a); to minimize the loop delay,



**Fig. 11** Receiver block diagram



**Fig. 12** DFE implementation

the first tap is not taken from the output of the first flip-flop, but from the output of the first latch (Fig. 12b).

The timing advantage of this latch-based DFE is shown in Fig. 13.

When a mux input is selected, the latch feeding this input is switched from evaluation to hold state; this means that the input of the mux is already available at mux selection, thus reducing the loop delay to the mux propagation delay and avoiding the activation time of the second latch of the input flip-flop.

Pseudo-CML logic, as shown in Fig. 14, has been selected for optimum speed of DFE latches, obtaining flip-flop delays better than 25 ps.

A rail to rail CMOS differential clock has been used and the tail current generator has been replaced by programmable resistors, to allow low voltage operation. A series peaking topology is employed, improving the sampler sensitivity and minimizing its delay.

A current mode DFE sum has been implemented, using a CML stage with inductive boost. The inductor is 2 nH and uses 6 metal layers and a differential spiral topology. Occupied area is less than  $20 \mu\text{m} \times 20 \mu\text{m}$  and self resonance is higher than 20 GHz (Fig. 15).

The data eye reconstructed by the DFE is shared between the data path and the CDR path. The data path, shown in Fig. 16, makes use of a 10 ratio in demuxes, to satisfy the FC requirement for latency minimization. An overall serial to



**Fig. 13** Timing advantage of a latch-based DFE



Fig. 14 Pseudo CML latch and FF phase margin

parallel latency less than 40 UI is obtained, including word alignment to a reference ‘comma’ 10-bits word.

The auxiliary path A has programmable thresholds controlled by a Least Mean Squares (LMS) adaptation algorithm, together with the DFE taps. At system start-up, LMS adaptation and CDR convergence take advantage from the initial eye opening contributed by the input analog boost equalizer. Moreover, the adaptation can be frozen and the programmable thresholds, together with the phase rotator of this auxiliary path, can be used by an Eye Opening Monitor (EOM) to analyze the eye at the sampling point, optimizing the PGA, the analog boost and the sampling phase.



Fig. 15 DFE summing node and inductor topology (2 metal example)



**Fig. 16** Data path

### 3.3 Clock Recovery

The clock recovery requirements are strictly related to the adopted communication standard. In all the selected standards SATA, SAS and FC the transmitted data is 8 b/10 b encoded; this implies that a clock pattern can be transmitted as a valid data, thus preventing the use of clock recovery schemes that employ only low frequency patterns [8]. For the same reason, the use of an analog boost equalizer in front of the DFE helps clock recovering during clock pattern and helps in minimizing phase jumps in case of pattern changes.

Moreover, the CDR is required to address the frequency drift tracking, including the tracking of the SSC profile (Fig. 17), which represents one of the main CDR challenges in SATA and SAS.

The implemented CDR relies on an early-late technique, sampling the DFE output eye in the center and in the edge (Fig. 18).

The core of the CDR is represented by the proportional-integrative (PI) controller of Fig. 19.

$N_{DMX}$  demux samples are counted and multiplied by the proportional gain  $K_p$ . The result increments a cyclic accumulator and when the accumulator cycle  $CA$  is reached, the phase is advanced or delayed by one step depending on overflow or underflow. The drawback of a proportional controller is its limited capability in frequency drift tracking, (defined as Proportional Tracking,  $Pt_{ppm}$ ):

$$Pt_{ppm} = \frac{Td \cdot Kp}{CA \cdot NPH} \cdot 1e6 \quad (3)$$



**Fig. 17** SSC profile

SATA down spreading

center spreading



**Fig. 18** CDR path

where NPH is the number of phases inside a data UI and  $T_d$  is the transition density of the incoming data pattern.

Increasing  $K_p$  to satisfy the frequency drift requirements would lead to excessive loop bandwidth and, at the end, jitter; for this reason an integral path is added, allowing to track a constant frequency drift without requiring a continuous phase update from the proportional path. On the other hand, SSC is characterized by a non constant frequency drift, with triangular shape, thus requiring a continuous updating of the integrator value  $I_{val}$  that at each clock cycle is injected into the cyclic accumulator. This leads to the requirement of a very accurate sizing of the integral path, whose maximum capability in tracking the slope of the frequency drift (defined as Integral Tracking,  $It_{ppm/UI}$ ) is expressed by the following formula:

$$It_{ppm/UI} = \frac{T_d \cdot K_i}{CA \cdot NPH \cdot Ndmx \cdot K_s} \cdot 1e6 \quad (4)$$

As a consequence of the early-late phase detection, the overall CDR has non linear performances. Nonetheless, a linear model of the gain loop can help in understanding the overall loop performances. Fig. 20 shows a typical loop gain plot, made of a 40 dB/dec and a 20 dB/dec slope at low and high frequency respectively, and it is correlated with a jitter tolerance plot. In the linear loop gain



**Fig. 19** CDR PI controller



Fig. 20 CDR linearized analysis

analysis, the phase detector gain has been set to 4, considering that a phase range of  $+/- 0.5 \text{ UI}$  ( $+/- 0.25$  on average) is converted into  $+/- 1$  by the phase detector.

Transient domain simulations show that a low Integral Tracking ( $I_t$ ) causes degradation in the high frequency jitter tolerance, thus requiring higher values of  $K_i$ . On the other hand, increasing  $K_i$  reduces the phase margin in the linear loop model, causing the negative peaking in the jitter tolerance. To improve the phase margin,  $K_p$  can be increased, but the overall latency of the digital loop again corrupts the jitter tolerance at high frequency, thus representing the main limitation to the overall CDR performances.

For this reason, to optimize the jitter performances, the CDR of Fig. 19 has been modified as in Fig. 21.

The overall latency has been minimized by using multi-rate demultiplexers, thus operating the CDR always at 750 MHz in 1.5, 3 and 6 Gb/s and by limiting the digital core to three stages. The loop phase margin has been optimized by using decimal values for  $K_p$  and  $K_i$  instead of binary values as in previous implementations [9] and minimizing bandwidth variations by counting the number of data transitions and correcting the proportional and integral gains.

With the above mentioned techniques, the jitter tolerance has been optimized in presence of a slope in the frequency drift, as in case of SSC. On the other hand,



Fig. 21 CDR jitter optimization



**Fig. 22** CDR implementation

additional requirements come from the maximum frequency drift to be tracked, expressed by the following relation:

$$PPM_{\max} = \frac{Ival_{\max}}{CA \cdot NPH \cdot Ndmx} \cdot 1e6 \quad (5)$$

A high frequency drift capability requires either a small demuxing ratio or a low number of phase rotator phases. On the other hand, sizing the loop for a high frequency drift, as required in SATA/SAS, would limit the frequency resolution in FC, where the maximum drift is  $+/- 200$  ppm, only. For these reasons, the CDR in Fig. 21 has been further modified, as shown in Fig. 22.

A double step has been inserted in phase selection to allow working with 16 phases instead of 32 for  $+/- 7800$  ppm maximum tracking capability at 6 Gb/s without exceeding the 750 MHz operation. At the same time, to preserve the frequency resolution in  $+/- 200$  ppm mode, a programmable decimation has been inserted in the frequency path, periodically blanking the injection of the integrated value  $I_{val}$  into the cyclic accumulator.

Figure 22 also shows the CDR capability to increase  $K_p$  and  $K_i$  to allow fast locking in FC applications. In case an unlock condition is detected, CDR gains are increased for a fixed time to allow a minimal locking time and then set back to their nominal values, optimized for jitter minimization.

### 3.4 Measured Results

Prototypes realized in 65 nm CMOS have been packaged in a plastic BGA and plugged in a high frequency socket on a FR4 board.

The overall jitter tolerance performances, when higher frequency drift is applied, are shown in Fig. 23. In this plot, 6 Gb/s with  $+/- 2500$  ppm and 8 Gb/s with 200 ppm are compared in case of channel loss (38" for 6 Gb/s and 30" for 8.5 Gb/s) and of direct connection.



**Fig. 23** Jitter tolerance for 6 Gb/s and 8.5 Gb/s

Under these conditions the overall receiver shows a sinusoidal jitter tolerance of 0.4 UI on top of the intrinsic jitter of data and clock sources, thus allowing margins for a safe operation, assuming additional jitter coming from the transmitter.

## 4 Beyond 10 Gb/s Serial Communication

One of the techniques applied for high speed serial communication in band-limited channels is duobinary signaling [10, 11]. Instead of recovering the channel loss by equalization, duobinary converts the channel into a well known  $1 + z^{-1}$  channel. This is commonly done by a transmitter pre-emphasis.

Two main aspects can be mentioned in duobinary signaling:

- the signal at Nyquist is not recovered;
- the channel is well equalized (within 3 dB loss) up to Nyquist/2.

As shown in Fig. 24, the duobinary detector makes use of two samplers, with the following detecting logic:

- If threshold A is exceeded, a ‘1’ is detected
- If threshold B is not exceeded, a ‘0’ is detected
- If threshold B is exceeded but threshold A is not exceeded, the detected bit is assumed to be reverse of the previous detected bit.

In fact, when the signal is in between the two thresholds, a data transition has happened but the channel response has not been completed yet.

Duobinary technique shows that it is not required to equalize the channel at Nyquist for data recovery. On the other hand, this implies that clock recovery on



**Fig. 24** Duobinary channel and detection sampling points

clock pattern is no more possible: data encoding and frequency drift requirements must be set coherently, in order to make duobinary communication feasible.

Figure 25 shows the ‘eye’ seen by the top duobinary sampler: the samplers  $A_1$  and  $A_2$  in Fig. 25a would detect the same bit, thus leading to the conclusion that the effective eye is the one shown in Fig. 25b.

In case the channel is not well equalized to duobinary, either because the clock is not suppressed or because low frequency channel equalization is not enough, the position of the duobinary threshold is no more optimized; anyway, an eye still exists and another optimal decision point can be found, as shown in Fig. 26.

This is the same eye of a ‘Look-Ahead’ DFE [12], which consists of a positive threshold to detect the signal when the previous bit is ‘1’ and a negative threshold when the previous bit is ‘0’.

Assuming the same sampling phase, the look-ahead DFE sampler would lead to the same eye of Fig. 26b and an LMS threshold adaptation would find the optimal vertical threshold level.

Actually, there is no difference between the look-ahead DFE detection logic and the duobinary detection logic, as shown by the example in Fig. 27 in case the previous detected bit is ‘1’.



**Fig. 25** Duobinary eye analysis



**Fig. 26** Optimal decision point in a channel that is not duobinary

**Fig. 27** Equivalence between DB logic and LA DFE logic



The three possible signal levels, either analyzed by a duobinary logic or by a DFE look-ahead logic, result in the same final decision.

This leads to the conclusion that the look-ahead DFE can still receive duobinary signaling, once the proper phase and sampling thresholds are adopted. To this extent, eye opening monitors and LMS can be employed to optimize receiver performances, by finding the optimal sampling phase and thresholds even in case the duobinary shaping of the channel is not ideally achieved.

## 5 Conclusions

The above analysis shows that equalization and clock recovery must be carefully defined taking into account communication standard requirements such as frequency drift, data pattern requirements, data encoding etc. Channel pre-shaping, either by analog boost equalizers or by duobinary signaling, reveals a general technique to simplify equalization requirements, but crosstalk may represent the ultimate limit due to the reduction of signal amplitude. Channel boosting at Nyquist frequency is not mandatory for data recovery, but clock recovery limitations can add additional constraints, preventing from exploiting all the capabilities of linear and non linear (DFE) channel equalization.

## References

1. M. Sorna, T. Beukema et al., “A 6.4 Gb/s CMOS SerDes Core with Feedforward and Decision-Feedback Equalization”, ISSCC Dig. of Tech. Papers, pp. 62–63, Feb. 2005.

2. R. Payne, B. Bhakta et al., “A 6.25 Gb/s Binary Adaptive DFE with First Post-Cursor Tap Cancellation for Serial Backplane Communications”, ISSCC Dig. of Tech. Papers, pp. 68–69, Feb. 2005.
3. M. Meghelli, S. Rylov et al., “A 10 Gb/s 5-Tap-DFE-4-Tap-FFE transceiver in 90 nm CMOS”, ISSCC Dig. of Tech. Papers, pp. 80–81, Feb. 2006.
4. K. J. Wong, C. K. Yang, “A Serial-Link Transceiver with Transition Equalization”, ISSCC Dig. of Tech. Papers, pp. 82–83, Feb. 2006.
5. Fibre Channel, “Physical Interface-4 (FC-PI-4)”, Int. Committee for Information Technology Standardization (INCITS), Rev. 7, Sept. 2007.
6. R. Kajley, P. Hurst, “A Mixed-Signal Decision-Feedback Equalizer That Uses a Look-Ahead Architecture”, IEEE J. Solid-State Circuits, Vol. 32, No. 3, March 1997.
7. S. Gondi, B. Razavi, “Equalization and Clock and Data Recovery Techniques for 10-Gb/s CMOS Serial-Link Receivers”, IEEE J. Solid-State Circuits, Vol. 42, No. 9, September 2007.
8. M. Harwood, N. Warke et al., “A 12.5 Gb/s SerDes in 65 nm CMOS Using a Baud-Rate ADC with Digital Receiver Equalization and Clock Recovery”, ISSCC Dig. of Tech. Papers, pp. 436–437, Feb. 2007.
9. J. L. Sonntag, J. Stonick, “A Digital Clock and Data Recovery Architecture for Multi-Gigabit/s Binary Links”, IEEE J. Solid-State Circuits, Vol. 41, No. 8, August 2006.
10. K. Yamaguchi, K. Sunaga et al., “12 Gb/s Duobinary Signaling with x2 Oversampled Edge Equalization”, ISSCC Dig. of Tech. Papers, pp. 70–71, Feb. 2005.
11. J. H. Sinsky, M. Duelk et al., “High-Speed Electrical Backplane Transmission Using Duobinary Signaling”, IEEE Trans. On Microwave Theory and Techniques, Vol. 53, No. 1, January 2005
12. V. Stojanovic, A. Ho et al., “Autonomous Dual-Mode (PAM2/4) Serial Link Transceiver With Adaptive Equalization and Data Recovery”, IEEE J. Solid-State Circuits, Vol. 40, No. 4, April 2005.

# Top-Down Bottom-Up Design Methodology for Fast and Reliable Serdes Developments in nm Technologies

Jan Crols

**Abstract** This paper describes the development of high speed serial data communication links from the viewpoint of signal and circuit complexity. It proposes a development method to deal in reliable and affordable with the increasing complexity. Two example implementations are discussed: a 10 Gbps link in 0.13  $\mu\text{m}$  CMOS and a 2.5 Gbps PCI-Express link in 90 nm CMOS.

## 1 Introduction

Over the past ten years high speed serial data communication has undergone an amazing fast evolution. Not so long ago high speed serial data communication above 1 Gbps was limited to high-end long range applications since the only available technology suitable of achieving the required performance was typically a bipolar technology, resulting in serious cost issues. Examples are the many bipolar ASSPs that used to serve the SONET and SDH telecom markets [1, 2].

With the advent of the 0.18  $\mu\text{m}$  and especially 0.13  $\mu\text{m}$  CMOS technologies around the year 2000, the technologies that allow for massive digital signal processing and handling became also capable of performing the functions needed to achieve multi Gbps serial data communication. This rapidly made that high speed serial data communication changed to an ASIC market in which it became a key differentiating analog IP for the digital ASIC providers. Indeed in many cases an ASIC provider would be selected based on whether it had the key high speed serial IO IP's in house to achieve the required IO and package cost reduction for their customers. Examples are the video processor chips that needed to adopt PCI-Express, HDMI or DisplayPort IO in order to make a difference.

Over the last few years high speed serial data communication has again taken an important step. It has entered in many different forms in many different applications and at the same time it has become a commodity non-differentiating IP that is seen as part of the standard IO library for 90 nm and 65 nm technologies. Remarkable

---

J. Crols (✉)  
AnSem, Heverlee, Belgium  
e-mail: Jan.crols@ansem.com

in this evolution, or maybe a reason for this fast evolution from expensive external component to cheap and abundant internal SOC component, is that actually the speed of operation is not evolving at all this fast. Reason for this is the bandwidth limitations set by the communication media in use. Once the 1 Gbps limit was passed things have gone fast with operation at 2.5/3.125 Gbps, but even today the 10 Gbps per lane communication is still not widely adopted and 6.5 Gbps has become an in between step taking more time than maybe originally anticipated [3].

The amazingly rapid market evolution of high speed serial data communication has offered new opportunities for the design community as the existing serdes and CDR techniques needed to be combined with CMOS RF design techniques. The development of a high speed serial data communication transmit and receive channel requires advanced analog and RF design techniques at many different levels of abstraction. Now that it has become in many cases a commodity IP, we must find ways to handle this complexity in an affordable development time, at an affordable budget and especially at a low risk as the latter would have unacceptable impact on development time and budget.

This paper gives an overview of the techniques and technologies that are used and need to be combined in high speed serial data communication transmit and receive channel development under 10 Gbps. It examines the design methodologies that can be used to handle this level of complexity in an efficient manner. Finally, 2 examples are analyzed in order to illustrate this.

## 2 The Complexity of a Serdes System

There are many possible different ways to realize a high speed serial data communication transceiver. Each of them can or will have its merit when taking application specific targets into consideration. Apart from different trade-offs between link quality and power consumption or silicon area cost, there are also such consideration as the CMOS technology node that is used and whether or not multi-lane or multi-rate must be supported. Nonetheless, the following functions can in almost any transceiver be identified:

- a PLL running at the data rate
- a serializer
- a line driver, with or without pre-emphasis
- a line receiver, with or without an equalization function
- a clock and data recovery loop
- a deserializer

If we want to analyze the complexity of implementing a high speed serial data communication transceiver, there are many ways to look at this and each is valid and brings its own set of considerations. One view is indeed the selected architecture for the above mentioned functions. Another important view is to look at the type of signals present in the architecture. In principle one could say that all is digital since



**Fig. 1** Example receiver architecture showing different signal types as described in the text

the incoming and outgoing data stream are digital, but clearly this is highly over simplified. On the one hand a lot of purely analog signals are present, especially the biasing signals and the control signals in a PLL, on the other hand there are many different types of digital signals that each have their own requirements towards the block that is processing them. Figure 1 illustrates these on the PCI-Express receiver block diagram that is discussed as an example further on.

1. *Synchronous digital signals for which abstraction of analog properties can be made:* Examples are found mainly in the operations that will be performed on the parallel data stream like 8 b–10 b coding, comma detection or FIFO functions to allow clock alignment with core digital. For the development of these blocks, a classical digital development flow can be used, including RTL coding, synthesis, timing closure using conservative timing constraints and standard cell place and route. Operation speeds are typically up to a few several 100 MHz.
2. *Synchronous digital signals that become so fast that analog properties start to become relevant:* An example of such signals can be found in the digital loop filter in the PCI-Express example further on in this paper. These are blocks that may still use the standard digital CMOS libraries, but speed or timing constraints have become so critical that a classical digital flow would not be able to achieve timing closure anymore due to the margins that need to be taken in this process at different levels. In order to overcome these limitations higher speed logic styles may be used like CML logic or TSPC logic. At the same time an analog design approach will be needed to perform the timing verification such that a true set-up and hold time verification can be performed over all possible process corner variations (not just fast and slow) potentially including monte-carlo simulations to include device matching variations. The limitations for the digital loop filter not necessarily lie in the clock rate as it runs typically at 1/8th or 1/16th of the data rate, resulting in e.g. 311 MHz clocking for 2.5 Gbps operation. The challenge lies in the allowed latency that is to be kept low in order to assure stability of the CDR loop. In order to perform all filter functions within the limited number of clock cycles, a manual design using analog verification tools will be needed. The serialiser is another important block that runs a synchronous operation and needs careful manual design work, especially when executed in a logarithmic

tree implementation. Compared to the deserialiser it brings an extra level of complexity since in the serialiser the clock tree with its dividers runs in the opposite direction of the data stream, meaning that the delay times of the clock dividers are subtracted from the available flip-flop set-up times. This block uses especially at the highest clock rate sections CML logic. Relevant aspects that need to be closely analyzed and verified are in that case inter-symbol interference (ISI), PSRR (delay variations due to power supply variation), lane to lane crosstalk and DC offset. All can reduce the timing margins. The good news is that it is still a synchronous design, meaning that data refreshing is still possible and that these degrading effects will disappear from the data stream each time the data is reclocked, making it only critical to take into account for the flip-flop to flip-flop timing.

3. *Asynchronous digital signals*: These are of course the signals where it is all about in high speed serial data communication. There is no related clock anymore and reclocking to regenerate the data signals is thus not possible. ISI, PSRR induced jitter, crosstalk and offset induced duty cycle errors will now accumulate for these signals over processing stages. It is therefore of vital importance to keep the number of processing stages limited to the absolute minimum, basically the line driver, line receiver and the communication medium.
4. *Clock signals*: Special attention goes here to the clock signals. They basically fall under the asynchronous digital signals and they are present at different speeds in all the blocks of the transmitter and receiver. They will accumulate signal degradation, but the difference is still that ISI is not an issue and duty cycle errors are only an issue if rising and falling edge clocking is done.

But even when using this categorization of signal types, it actually can get even more complex. The distributed transmission line model as used in [3] and [4], shown in Fig. 2, is an example. Indeed, when distributing high frequency clock signals over longer distances, typically to serve as a reference to multiple lanes, one has either the option to amplify the clock signal with several buffers in between in order to keep the loading limited in each section, but these buffers will add jitter to the clock signal which will accumulate and can not be removed anymore. A better alternative may therefore be not to use any buffering at all and try to find other ways to allow the loading of the long line. This requires the use of a distributed line model in combination with such techniques as direct loading of the VCO [4] and inductive termination of the line [3].



**Fig. 2** Architecture using transmission line for clock distribution

Apart from recognizing all the different signal types and designing for it accordingly, there is for each of them also the important aspect of knowing how the different sources of interference can be induced onto the signal. There are the different types of interference sources, like thermal noise, related data from the same lane and unrelated data from adjacent lanes, but there is also the way in which they can interfere. There are in fact many possible interference transmission media. Apart from direct capacitive and inductive signal coupling, there is RI drop, high frequency power supply coupling, substrate coupling, package coupling, etc. ... Each of these will need to be analyzed and optimized for at the right moment during the development.

### 3 Design Flow, Tooling

Figure 3 tries to depict the general outline of the development flow. During high level design several potential architectures will be examined and block parameters and properties will be swept and explored in order to find the most suited architecture and the best according block specifications to be used in that architecture. Since this involves a very wide exploration of the design space that includes the whole system, there is a need for fast simulation/calculation times. This can be obtained



**Fig. 3** Top-down bottom-up development flow

by using very generic first order behavioral models/descriptions for the blocks and use a mathematical signal processing tool.

The high level of abstraction that is needed in the high level design phase makes that there is a need for a separate high level verification step. In this case one tries to verify early on in the development process whether the level of abstraction used during high-level design was sufficient and whether simplification has not resulted in wrong block parameters being selected that will later on, at the end of the development, prove incorrect because of second order effects that should have been taken into account. A typical well suited tool for such verification work is a mixed signal behavioral description and simulation tool like VHDL-AMS. With careful description of the right behavioral model for each block, it allows for in detail system level simulation, while for real open exploration work it may be too time consuming in coding and simulation times that are needed.

In fact, if for high-level verification a tool like VDHL-AMS is used, it makes sense to use VHDL as the high level design tool. Although primarily seen as a digital description language, it is in many cases very well suited for the rapid coding of a serdes architecture. The reason for this is that almost all signals are in fact digital signals of which the most important property is their threshold crossing point and how this is influenced by different factors that introduce jitter (uncertainty on the timing of the crossing point). This can be handled perfectly and with fast enough simulation times with an event driven simulation tool like a VHDL simulator. Analog signals present in a serdes often change at a low enough rates such that they can be handled by the event driven simulator without significant deviations. Using VHDL for the high level design will allow an easy step by step increase of the complexity of the behavioral model in VHDL-AMS by introducing such effects as driving and load impedance or coupling. This increased complexity will require that also an adaptive time step solver algorithm is used to perform the simulation. This will result in a significant increase in system simulation time, making that it is best preserved only for the high-level verification process.

Once the architecture has been selected and the building block specifications have been fixed the actual design work at transistor level can start. This will involve the need of a spice like simulator, potentially in combination with an harmonic balance simulator for such properties as phase noise or PSRR. In this transistor level design the availability of a multi-mode simulation tool that allows the combination of the extended high-level behavioral model with gradually replaced parts at transistor level is of utmost importance to be able to rapidly verify whether originally assumed block requirements still hold when taking detailed transistor level block description and properties into account. Moreover it will be important to model early on also power supply lines, package, decoupling etc. and again here a multi-mode simulation tool can allow performing the system level simulations with this level of detail involved for some parts. A point that remains critical in high speed serial data communication design, especially for multi-lane set-ups, is a good way to model substrate coupling and add this to the whole process. Often there is not much more that can be done than use careful heuristic shielding rules and a deep n-well substrate separation strategy.

The top-down bottom-up approach in which behavioral models are gradually replaced one-by-one by transistor level descriptions can be further extended to include also results from layout back annotation and run these in system level checks.

In the end it is still of vital importance to run full transistor level checks of the whole system with its power supply, package and decoupling model. But complexity will be so large that only very limited checks can be performed, making that they can only serve to check whether the extensive top-down bottom-up check have not neglected an important aspect.

#### **4 Example 1: A General Purpose 10 Gbps Link in 0.13 $\mu\text{m}$ CMOS**

The first example discussed here is a general purpose 10 Gbps serial link in 0.13  $\mu\text{m}$  suited for such applications as SONET OC192. This means its focus is on high performance and high data rate throughput. Figure 4 shows its block diagram.

With each new step that is taken in the increase of data rates on serial links, the requirements for high frequency jitter generation become stricter. Basically, the generated jitter must scale with the reduction of the bit length. In a typical transmit module the generated random jitter will be mainly determined by the oscillator that generates the clock with which the serializer and line driver is clocked. A ring oscillator is the most practical and lowest cost implementation of an oscillator that can achieve relative high speeds. The problem with a ring oscillator is that if low phase noise requirements are imposed, the performance scaling that is expected from using finer line IC technologies remains limited. This makes that using ring oscillators for 10 Gbps operation is not possible anymore for high performance applications, even when using a 90 nm or 65 nm CMOS technology. The alternative is in that case the



**Fig. 4** 2-lane 10 Gbps serdes architecture

use of an LC oscillator with integrated inductor and varactor. This has the potential of a magnitude better performance in phase noise and PSRR, but that comes at a much larger chip area cost. The only way to reduce this cost is by sharing the LC VCO in a multi-channel set-up with a multiple of RX and TX modules. The more modules that can be served from the same LC VCO, the lower its overhead cost. The limitation on the number of modules that can be served from a single LC VCO is determined by the distribution of its high frequency clock signals. Long distances to be travelled will require intermediate buffering to reduce possible jitter degeneration by excessive loading, but each buffer itself will also add jitter. In this example the set-up is limited to two lanes for the LC-VCO PLL, but margin remains to increase this to a four lane set-up.

The jitter tracking requirements for a receive module are normally significantly larger than the jitter generation requirements for the transmitter. This makes that the use of a ring oscillator (RO VCO) remains an option for the receiver. A ring oscillator can be placed in each receive module and problems of full rate clock distribution are omitted. The use of a ring oscillator becomes even more an option if it can be run at half rate. By doubling in parallel the sampling flip-flops in the phase detector this is possible. Duty cycle mismatch problems due to the half rate sampling are less a problem in the receiver compared to the transmitter.

Figure 5 shows a simplified block diagram for a TX module. A logarithmic tree 8-to-1 multiplexer is used as serializer. The implementation of the serializer uses



**Fig. 5** TX module block diagram



**Fig. 6** RX module block diagram

single-ended logic for the 8-to-4 conversion and differential CML logic for the 4-to-1 stage.

Figure 6 shows the architecture for the RX module. The locking range of the RO VCO is reduced by using a replica RX PLL that locks to 5 GHz and sets the biasing for the RO VCO in the RX [5]. The RO VCO in the RX is a phase interpolating RO VCO that will be directly phase shifted at the full 5 GHz update rate by the UP and DOWN pulses coming out of the PD and that directly drive an extra delay element increase or decrease in the RO VCOs two stages.

An impression of the layout of a 2 lane implementation of this 10 Gbps serdes system is given in Fig. 7. Power consumption of the full 2 lane implementation is 600 mW from both the 1.2 V and 3.3 V supply together. Overall area of the full 2-lane IP is 2.45 mm<sup>2</sup> including pad area.

## 5 Example 2: PCI-Express in 90 nm

The second example is a multi-lane 2.5 Gbps PCI-Express implementation in 90 nm CMOS. Parallel data connections are omitted for clarity reasons. Each line contains its own PLL that runs a quadrature RO VCO of which the frequency is locked to the incoming reference frequency. The quadrature output signals of the RO VCO is directly used in the closeby RX and TX modules. The local placement of the RO VCO makes that no clock distribution is needed saving the power for the buffering and avoiding the extra jitter added in this process versus of course the extra power and area requirements of each individual RO VCO in each lane.

**Fig. 7** Layout view of the 2-lane 10 Gbps serdes IP



The TX module requires only a differential input clock to drive its serializer, while the RX uses a phase interpolator that runs from a differential quadrature clock input.

The TX module uses for the PCI-Express transmitter also a logarithmic tree 8-to-1 multiplexer as serializer. The implementation of the serializer uses single-ended logic for the 8-to-2 conversion and differential logic for the 2-to-1 stage. The line driver is capable of delivering  $1.05 \text{ V}_{\text{diff},\text{ptp}}$  in a  $100 \text{ Ohm}$  differential load and runs directly from the  $1 \text{ V}$  supply. It includes the receiver detection functions and a programmable pre-emphasis.

Figure 8 shows the architecture for an RX module. The clock and data recovery in the RX uses a phase interpolator and digital loop filter to lock to the incoming data stream [6, 7]. In a bang-bang phase detector the data and data edge are sampled from a phase interpolators clock signal that can be varied in 32 phase steps. The



**Fig. 8** PCI-Express RX architecture using a digital loop filter and phase interpolator for CDR

obtained early-late signals are downsampled with a factor 8 and then accumulated in a digital loop filter. The digital loop filter contains both a phase and frequency tracking register. The latter allows that frequency deviations of more than  $\pm 3000$  ppm can be tracked. The phase detector consists for a first line of CML flip-flops directly followed by a differential to single-ended conversion and further on logic is executed at full rate and at 1/8th of the rate in TSPC logic. The total delay in the CDRs control loop is limited to 40 full rate clock cycles. The RX module is complemented with a full rate 1-to-8 logarithmic deserializer that uses TSPC for the reduction of the data rate by 2. After that standard CMOS logic is used.

A chip layout impression of a single-channel is shown in Fig. 9. The chip is fabricated in a 90 nm CMOS technology and runs fully from the 1.0 V supply. The



**Fig. 9** Layout view of the PCI-Express serdes IP

PLL with the quadrature RO VCO measures  $52000 \mu\text{m}^2$  and consumes 15.2 mW. The TX module measures  $100000 \mu\text{m}^2$  and consumes 43 mW, while the RX module measures  $105000 \mu\text{m}^2$  and consumes 19.4 mW.

## 6 Conclusions

In this paper it was argued that the field of CMOS integration of high speed serial data communication IP has followed an even faster track from differentiating IP to commodity IP compared to other technology areas. This requires the use of an adapted development flow for these types of IP. Complexity of the development was described from the viewpoint of the signal types. A development flow based on VHDL for high level design, VHDL-AMS for high-level verification and multi-mode simulation for top-down bottom-up circuit design was proposed. Two example implementations where discussed: a 10 Gbps link in  $0.13 \mu\text{m}$  CMOS and a 2.5 Gbps PCI-Express link in 90 nm CMOS.

## References

1. P.C. Pham, J. McDonald, P. McDevitt, “A 2.5 Gb/s 32:1/1:32 Sonet Mux/Demux Chip Set”, Proceedings of the ISSCC, IEEE, February 1996, pp. 120–121.
2. R. Walker, C. Stout and C.-S. Yen, “A 2.488 Gb/s Si-Bipolar Clock and Data Recovery IC with Robust Loss of Signal Detection”, Proceedings of the ISSCC, IEEE, February 1997, pp. 246–247.
3. J. Poulton, et al., “A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS”, IEEE JSSC, Vol. 42, No. 12, December 2007, pp. 2745–2757.
4. T. Geurts, et al., “A 2.5 Gbps – 3.125 Gbps multi-core serial-link transceiver in  $0.13 \mu\text{m}$  CMOS”, of the 30th European Solid-State Circuits Conference, September 2004, pp. 487–490.
5. J.G. Maneatis, “Low-Jitter Process-Independent DLL and PLL Based on Self-Biased techniques”, IEEE JSSC, Vol. 31, No. 11, November 1996, pp. 1723–1732.
6. H. Tamura, et al., “5 Gb/s bidirectional balanced-line link compliant with plesiochronous clocking,” Proceedings of the ISSCC, IEEE, February 2001, pp. 64–65.
7. S. Sidiropoulos, M.A. Horowitz, “A semidigital dual delay-locked loop,” IEEE JSSC, Vol. 32, No. 11, November 1997, pp. 1683–1692.

# Mixed-Signal Implementation Strategies for High Performance Clock and Data Recovery Circuits

Michael H. Perrott

**Abstract** In implementing high performance clock and data recovery (CDR) circuits, there is an interesting tradeoff offered between analog and digital circuit implementations. Analog circuits provide a relatively low power and low area approach to performing high speed, continuous-time processing of signals, but lack the ability to perform sophisticated processing tasks that demand high accuracy and repeatability. In contrast, digital circuits readily provide the ability to perform complex processing tasks with high repeatability, but can be costly in terms of power and area when high resolution is required at high speeds. A mixed-signal approach to implementation combines both analog and digital circuits (i.e., a hybrid approach) such that each performs tasks best suited to their strengths in order to accomplish the desired functionality.

In this chapter, we examine mixed-signal implementation techniques that allow the achievement of high performance CDR circuits. We do this by example, and present a 2.5 Gbit/s, fully integrated CDR in 0.25 micron CMOS that utilizes a hybrid phase-to-digital converter, loop filter, and VCO to achieve 1.4 ps of rms jitter with a compact implementation that fits within a 5 mm by 5 mm package. In addition, an all-digital frequency acquisition method is utilized which allows acquisition times less than 2 ms without the need for an external frequency reference.

## 1 Introduction

Clock and data recovery (CDR) circuits are an essential component of high speed networks to achieve high rates of data transfer without the need of accompanying clock signals. As shown in Fig. 1, a CDR circuit has the function of generating a clock which is aligned in both phase and frequency to an incoming data signal. This clock is produced by a voltage-controlled oscillator (VCO) whose frequency and phase is controlled by a feedback loop which includes a phase detector and

---

M.H. Perrott (✉)  
Massachusetts Institute of Technology, Cambridge, MA, USA  
e-mail: mhperrott@gmail.com



**Fig. 1** A classical analog clock and data recovery circuit for use in an optical data network

analog loop filter. The loop filter acts to smooth out the pulsed output of the phase detector, and typically consists of a charge pump and a passive capacitor/resistor network.

Key performance metrics for CDR circuits are primarily focused on its noise performance, and include jitter generation, jitter tolerance, and jitter transfer. Achievement of low jitter generation implies that the CDR will produce an output clock that has low jitter in the presence of a clean data input signal. Achievement of high jitter tolerance implies that the CDR will correctly reproduce the data signal at its output despite the presence of high jitter on the data input signal. Finally, achievement of a desired jitter transfer characteristic requires that the CDR lowpass filter the jitter on the data input signal such that its impact on the CDR output clock is appropriately reduced.

For high performance data network standards such as SONET, the jitter transfer specification must be tightly controlled with respect to both its bandwidth and peaking. As shown in Fig. 2, a CDR circuit with linear behavior will typically exhibit peaking in its closed loop transfer function due to the presence of a zero in the loop filter. To achieve the desired condition of low peaking, it is necessary to place the zero at a low frequency relative to the closed loop bandwidth, which requires a relatively large capacitor value for  $C_{int}$  (shown in Fig. 1). A large capacitor implementation is highly undesirable since its corresponding area within an integrated loop filter can be prohibitively large for certain SONET applications.

With progressively increasing digital density achievable with modern CMOS processes, a very attractive option is to replace the classical analog loop filter with

**Fig. 2** Typical closed loop jitter transfer characteristic of a linear CDR circuit



a digital implementation which realizes the desired filtering behavior [1, 2]. The advantage of a digital filter implementation is that it allows realization of long time constants with small area, so that the need for large capacitors is completely avoided. However, as shown in Fig. 3, the use of a digital loop filter introduces new challenges to the phase detector and VCO. In particular, the phase detector must be altered such that it produces a reasonably high resolution *digital* signal that represents the phase difference between the clock and data signals. The VCO must be altered such that it changes its instantaneous frequency according to a digital signal while still maintaining low jitter generation. For most applications, the achievement of a low power implementation is highly desirable, and low area is important for achieving low cost.

The relative difficulty of the above challenges is a function of the application space, such that a digital CDR implementation is quite straightforward in some cases, but impractical in others. As such, a *mixed-signal* approach, which appropriately combines *both* analog and digital circuits, can provide a more practical alternative to a purely digital implementation in some applications. In striving for a mixed-signal design, the choice of appropriate boundaries between analog and



**Fig. 3** A digital loop filter implementation and its associated challenges

analog signaling and circuits can play a major role in achieving excellent performance with low power and area for a given application.

In this chapter, we will present strategies for achieving mixed-signal CDR implementations that achieve excellent performance with reasonably low area and power. We will do this through example, and focus on a high performance CDR implementation in 0.25 $\mu$  CMOS intended for 2.5 Gbit/s SONET applications. In particular, we will discuss the issues associated with achieving a high resolution phase detector with a digital output, a hybrid VCO which efficiently achieves low noise, a hybrid loop filter which allows a compact and high resolution implementation, and an all-digital frequency acquisition method that does not require a reference frequency. These examples will highlight the relative strengths of analog and digital circuits, and the key issues faced in achieving an efficient combined implementation.

## 2 High Performance Phase Detection with a Digital Output

The simplest implementation of a phase detector with digital output is the bang-bang structure shown in Fig. 4 [3]. The key operating principle of this circuit is to sense whether an input data edge is before or after a corresponding clock edge, and then output a positive or negative pulse depending on the outcome. In cases where there is no data transition close to a clock edge, the phase detector maintains an output of zero. Unfortunately, the bang-bang detector leads to highly nonlinear behavior for the CDR since it can distinguish only the sign of the phase error, and not its magnitude. Since a consistent jitter transfer function implies linear dynamics, the bang-bang phase detector will not be suitable for high performance CDR applications such as SONET.

One possible way of linearizing the bang-bang detector is to add extra levels to it as shown in Fig. 5. In this case, the magnitude of the phase error can be sensed in addition to its sign, though only in discrete intervals which are set by the delay of the buffers shown in the figure. In the case where the input data signal has sufficient jitter to exercise the various levels of this detector, the CDR dynamics will behave in a reasonably linear manner such that the jitter transfer characteristic becomes well defined. Unfortunately, since each portion of the detector must operate at a high clock frequency (often in the GHz range), power consumption can be an issue for



**Fig. 4** A bang-bang phase detector



**Fig. 5** A multi-level bang-bang phase detector

this structure. Also, in cases where the jitter on the input data is small, nonlinear behavior may again result in a poorly defined jitter transfer function.

The shortcomings of the multi-level bang-bang detector highlight the difficulties in achieving high speed digital processing of high resolution signals while maintaining low power consumption. In contrast, high speed analog processing of signals is inherently of a high resolution nature (with noise being the chief limitation), and can often be performed by corresponding analog circuits with relatively low power consumption. This leads to a key principle of designing efficient mixed-signal circuits – leverage analog circuits to achieve high speed processing of high resolution (i.e., continuous) signals in cases where inaccuracy can be tolerated, and digital circuits to achieve more sophisticated processing of low speed signals and/or high speed signals with low resolution.

In line with the above strategy, Fig. 6 shows a proposed mixed-signal phase detector that achieves a digital output by performing high speed analog-to-digital conversion of the output of an analog Hogge phase detector [4]. As shown in the left side of the figure, the Hogge detector creates a pulsed output whose positive pulses have an area corresponding to the phase difference between the data and clock. The negative pulses shown in the figure always have constant area, and are created in order to achieve a net area of zero when the data edges coincide with the falling edge of the clock signal. Since the area of the Hogge pulses provides a continuous, analog representation of the phase error, analog-to-digital conversion is required to achieve an overall digital output. As shown on the right side of the figure, a first-order, continuous-time  $\Sigma-\Delta$  structure provides a very simple implementation for the ADC. While the resulting output consists of only one bit, the effective resolution is actually quite high after (digital) lowpass filtering due to the highly oversampled nature of the signal. For instance, if we assume a multi-GHz clock frequency, and a one MHz digital lowpass filter bandwidth, the effective oversampling ratio will be close 1000 such that > 10-bit ADC performance can be achieved. Such lowpass filtering is inherently provided by the loop filter used within the CDR.

In summary, we have pointed out that a mixed-signal approach to achieve a phase detector with a digital output allows high resolution phase comparison to be achieved with a compact and low power implementation. The digital output allows a digital loop filter to be leveraged in order to avoid large capacitors, and the high resolution of detection preserves linear CDR behavior such that a well defined jitter transfer function can be achieved.



Fig. 6 A mixed-signal phase detector with digital output

### 3 High Performance Oscillator Structures with Digital Frequency Control

Figure 7 displays a simplified view of how digital control of the frequency of an LC oscillator can be achieved through a switched-capacitor network by altering the resonant frequency of the tank according to the amount of capacitance switched



Fig. 7 Digital control of an LC oscillator using a switched-capacitor network

into the tank [5]. A similar approach can be applied to a ring oscillator [6], but we will focus on the LC structure since it is currently the preferred implementation for achieving low noise in high performance applications such as SONET.

As mentioned in the introduction, high performance CDR applications demand low jitter from the VCO, which, in turn, demands high resolution when digital control is utilized. For the LC design shown in Fig. 7, high resolution is achieved by dithering between capacitor values (such that the effective capacitance value can be adjusted by fractions of the unit capacitor size) and by having small unit capacitors (which minimizes the resulting noise created by dithering). Unfortunately, the need for a small unit capacitor size results in a complicated design when a wide frequency range is required of the oscillator. A large frequency range is desirable in order to obtain robust manufacturability in the presence of process and temperature variations. Therefore, while there are many merits to this approach, it does require a substantial investment of design resources to achieve high performance operation.

Figure 8 shows an alternative means of achieving digital control of an LC oscillator, which is to simply control the input of a varactor within a hybrid VCO [7] with the output of a digital-to-analog converter (DAC). In order to limit the frequency range required of the varactor (which lowers its influence on the phase noise of the oscillator), a switched-capacitor network can be used to perform coarse calibration of the oscillator in order to remove the impact of process variations [7]. Since the unit capacitor size in the array can be much larger than the “all-digital” design shown in Fig. 7, its control network is much less complex and a simpler design effort can be applied to achieve high performance. In practice, the coarse calibration is often performed off-line with a frequency acquisition circuit, and the analog varactor is controlled by the feedback action of the phase-locked loop.



**Fig. 8** Digital control of an LC oscillator using a varactor and DAC

## 4 Demonstrating the Benefits of a Mixed-Signal Approach Through an Efficient Loop Filter Implementation

Now that we have examined the hybrid phase detector and VCO implementations, it is time to turn our attention to the loop filter circuit which connects them. As shown in Fig. 9, it is generally desirable to achieve an integrator plus lead/lag loop filter for the CDR. The integrating portion of this filter ensures that the steady-state phase error of the CDR goes to zero, which is important to achieve the best jitter tolerance from the CDR (i.e., since it allows the CDR output clock to consistently align to the middle of the input data eye with proper design of the CDR). The zero is required for stability, and the pole is generally placed high enough in frequency to avoid significantly impacting the band edge of the closed loop transfer function while still providing some measure of high frequency noise reduction.

In the case where a hybrid VCO is used, it is necessary to use a DAC to transfer the digital output of the loop filter to the analog control voltage of the varactor. In such case, the DAC must support a bandwidth that is considerably larger than the desired closed loop bandwidth of the CDR. Assuming a reasonably high resolution is required of the DAC in order to achieve low jitter generation, the DAC implementation presents a challenge due to its high bandwidth requirement.

The issue of achieving an efficient DAC implementation allows us another glimpse of the benefits of a mixed-signal approach in which we best utilize the attributes of both analog and digital circuits to achieve an efficient implementation. As shown in Fig. 10, we can realize the integrator plus lead/lag filter function as the sum of two parallel paths which separately implement integration and feedforward paths. Since the integration path requires much less bandwidth than the feedforward path, it is straightforward to achieve an efficient DAC implementation for this portion of the loop filter. We then implement the integrator as a digital accumulator,



**Fig. 9** Approximation of a lead/lag loop filter with a digital implementation

**Fig. 10** Leveraging a hybrid loop filter approach to achieve an efficient integrator plus lead/lag filter implementation



and use a decimator to lower the operating frequency of the accumulator so that its power consumption is reduced [1]. Since the high frequency pole implemented by the feedforward path need not be accurately set, it is straightforward to implement this section with a simple charge pump and RC network. The resulting structure can be much more efficiently implemented than a purely digital approach (which requires a high-bandwidth DAC) or a purely analog approach (which requires a large capacitor to realize the integrating capacitor,  $C_{\text{int}}$ ).

## 5 Leveraging a Digital Approach to Initial Frequency Acquisition Without the Need for an Input Reference Frequency

We now turn our attention to the issue of setting the digital capacitors during initial frequency acquisition of the CDR. In cases where a reference frequency is available, such adjustment is straightforward by use of simple frequency comparison techniques using digital counters [8]. However, in many systems it is more convenient to avoid the need of this reference frequency, which means that initial frequency acquisition must rely on direct comparison of the input data stream and CDR output clock.

Typical reference-less frequency acquisition methods are based on analog techniques which seek to determine the sign of the frequency error, and then integrate the resulting sign signal to move the CDR output frequency in the proper direction [9–13]. Once the frequency error is small enough in magnitude, the feedback action of the CDR will then lock the output clock to the data stream.

Instead of relying on a traditional analog method, let's instead consider the possibility of extracting the presence of a frequency error with a purely digital approach. To do so, we propose using a heuristic method based on the observation of entry rates into a “forbidden zone” [1], as shown in Fig. 11. To explain the figure, consider that it illustrates the placement of input data edges relative to the CDR output clock edges under different operating conditions for the CDR. For a well designed CDR, the locked state should yield input data edges that are nominally placed 180 degrees away from the sampling edge of the CDR clock (assumed to be the falling *clk* edges



**Fig. 11** Comparison of input data and CDR output clock edges under different conditions

in the figure) so as to maximize the setup and hold times of the re-timing register within the CDR, and therefore maximize jitter tolerance for the CDR. However, a slight offset away from 180 degrees, as implied in the figure, will generally be acceptable in most applications. As shown in the figure, jitter on the input data signal (or CDR clock output) will cause variation in the relative phase difference between its edges and the CDR output clock edges.

We define the “forbidden zone”, as shown on the left side of Fig. 11, as a range of phase differences between the data input and CDR clock edges which should never occur under locked conditions with low jitter. As jitter is increased in the locked-state, as shown in the middle of Fig. 11, entering into the forbidden region corresponds to incorrectly re-timing the input data stream such that a bit error may occur. In normal operating conditions, the resulting bit error rate should be low, and is often specified to be  $<10^{-12}$  for applications such as SONET. Finally, under unlocked conditions, as shown on the right side of Fig. 11, there will be repeated entry in the forbidden zone due to the frequency offset associated with being out-of-lock. In such case, the entry rate into the forbidden zone will be significantly higher than would be encountered in the locked state with reasonable levels of jitter.

To sense frequency offset (and therefore an unlocked CDR state), we can simply monitor the number of times the forbidden zone is entered in a given period of time. If that number exceeds a threshold, which is determined by appropriate statistical analysis, we then declare the CDR to be out-of-lock. In such case, we then update the capacitor setting according to the pattern shown in Fig. 12. At each new setting, we again monitor the number of times the forbidden zone is entered over a given time window, and then change to a new capacitor setting if the number again exceeds the given threshold. Eventually, a capacitor setting will be reached which allows the CDR to obtain lock, at which point the entry rate into the forbidden zone will



**Fig. 12** Proposed pattern of adjusting digital capacitor settings during frequency acquisition

significantly drop such that the count threshold is no longer exceeded. At that point, the CDR is declared to be in lock, and the digitally-controlled capacitor settings are no longer altered.

To implement the above approach, we need only a circuit that can sense entry into the forbidden zone, and some simple digital logic that can compare the forbidden zone counts to a pre-defined threshold value. Figure 13 shows one possible means of implementing the forbidden zone entry detection circuit, which consists of augmenting the Hogge detector mentioned earlier with a few extra digital gates. One can see that the implementation is quite simple, and can be operated at very high



**Fig. 13** Proposed implementation of forbidden zone entry detection

frequencies. For additional details on this circuit, as well as the proposed frequency acquisition algorithm, please refer to [1].

## 6 Demonstration of a High Performance CDR Using Mixed-Signal Circuit Techniques

Figure 14 displays the key sections of the proposed CDR structure which leverages mixed-signal techniques to achieve high performance with compact area and low power. As discussed earlier, the key circuits include a combined Hogge detector and first order  $\Sigma-\Delta$  ADC to perform phase comparison with a digital output, a hybrid VCO that is coarse-tuned by digitally-controlled capacitors and fine-tuned by an analog varactor, and a hybrid loop filter that controls the analog varactor through the summation of a digital accumulator path and a higher bandwidth analog feedforward path. The coarse-tuning of the digitally-controlled capacitors is performed by the reference-less frequency acquisition approach discussed in the previous section.

Figure 15 shows a die photo of the CDR, which includes not only the key blocks described above, but also a high speed limiting amp and a loss-of-signal (LOS) detector. The chip fits within a 5 mm by 5 mm package, which demonstrates the low area offered by this hybrid implementation. At 2.5 Gbit/s operation, the chip consumes only 170 mA including all output drivers, which yields overall power dissipation of only 425 mW with a 2.5 V supply in 0.25 $\mu$  CMOS.



**Fig. 14** Key components of proposed hybrid CDR implementation



**Fig. 15** Die photo of hybrid CDR in 0.25 $\mu$  CMOS

Figure 16 shows the measured frequency acquisition time when adjusting the frequency of the input data stream from 2.5 Gbit/s to 2.4 Gbit/s. We see that the all-digital frequency acquisition approach is able to achieve acquisition times less than 2 ms without the need for an external frequency reference.

Figure 17 shows the measured eye diagram of the CDR at 2.5 Gbit/s operation under the conditions of a 10 mV peak-to-peak input data signal corresponding to a



**Fig. 16** Measured frequency acquisition time at 2.5 Gbit/s operation



Fig. 17 Measured eye diagrams at 2.5 Gbit/s operation



Fig. 18 Measured jitter transfer and tolerance at 2.5 Gbit/s operation

PRBS  $2^{31}$  pattern. The resulting jitter is only 1.4 ps (rms), which is significantly less than the SONET requirement of 4 ps (rms).

Finally, Fig. 18 shows the measured jitter transfer and jitter tolerance performance of the CDR at 2.5 Gbit/s operation. We see that the prototype meets all requirements of the SONET specification, including the jitter peaking requirement of <0.1 dB.

## 7 Conclusions

This chapter presented mixed-signal techniques to achieve high performance CDR circuits. As the first example, a linear, compact, and low power phase-to-digital converter was realized by combining a classical Hogge phase detector with a first order, continuous-time  $\Sigma-\Delta$  modulator that efficiently converts the analog Hogge output to a corresponding digital output. As a second example, the merits of a classical hybrid VCO were discussed, in which digitally-switched capacitors are utilized for coarse tuning and an analog varactor for fine-tuning. In order to achieve efficient control of the fine-tune path of the VCO, a hybrid loop filter was described which leverages a digital accumulator and an analog feedforward path to achieve a compact implementation without the need of a high speed, high resolution DAC circuit. Finally, an all-digital frequency acquisition circuit was described which allows <2 ms acquisition time without the need for an external reference frequency. These various examples highlight the fact that efficient choice of the boundaries between analog and digital circuits can yield a high performance CDR implementation with low area and low power.

## References

1. M.H. Perrott, Y. Huang, R.T. Baird, B.W. Garlepp, D. Pastorello, E.T. King, Q. Yu, D.B. Kasha, P. Steiner, L. Zhang, J. Hein, and B. Del Signore “A 2.5 Gb/s Multi-Rate 0.25u CMOS Clock and Data Recovery Circuit Utilizing a Hybrid Digital Loop Filter and All-Digital Referenceless Frequency Acquisition”, IEEE JSSC, Vol. 41, No. 12, Dec 2006, pp. 2930–2944.
2. R.B. Staszewski, J. Wallberg, S. Rezeq, C.-M. Hung, O. Eliezer, S. Vemulapalli, C. Fernando, K. Maggio, R. Staszewski, N. Barton, M.-C. Lee, P. Cruise, M. Entezari, K. Muhammad, and D. Leipold, “All-Digital PLL and Transmitter for Mobile Phones”, IEEE JSSC, Vol. 40, No. 12, Dec 2005, pp. 2469–2482.
3. J. Alexander, “Clock recovery from random binary signals”, Electronic Letters, Vol. 11, No. 22, 1975, pp. 541–542.
4. C.R. Hogge, “A Self Correcting Clock Recovery Circuit”, IEEE Journal of Lightwave Technology, Vol. LT-3, Dec 1985, pp. 1312–1314.
5. R.B. Staszewski, C.-M. Hung, N. Barton, M.-C. Lee, and D. Leipold, “A Digitally Controlled Oscillator in a 90 nm Digital CMOS Process for Mobile Phones”, IEEE JSSC, Vol. 40, No. 11, Nov 2005, pp. 2203–2211.
6. J.A. Tierno, A.V. Rylyakov, D.J. Friedman, “A Wide Power Supply Range, Wide Tuning Range, All Static CMOS All Digital PLL in 65 nm SOI”, IEEE JSSC, Vol. 43, No. 1, Jan 2008, pp. 42–51.

7. E. Hegazi, H. Sjoland, and A.A. Abidi, "A Filtering Technique to Lower LC Oscillator Phase Noise", IEEE JSSC, Vol. 36, No. 12, Dec 2001, pp. 1921–1930.
8. M. Meghelli, B. Parker, H. Ainspan, and M. Soyuer, "SiGe BiCMOS 3.3-V Clock and Data Recovery Circuits for 10-Gb/s Serial Transmission Systems", IEEE JSSC, Vol. 35, No. 12, Dec 2000, pp. 1992–1995.
9. J. Savoj and B. Razavi, "A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Binary Phase/Frequency Detector", IEEE JSSC, Vol. 38, No. 1, Jan 2003, pp. 13–21.
10. R.-J. Yang, S.-P. Chen, and S.-I. Liu, "A 3.124 Gb/s Clock and Data Recovery Circuit for the 10-Gbase-LX4 Ethernet", IEEE JSSC, Vol. 39, No. 8, Aug 2004, pp. 1356–1360.
11. A. Rezayee and K. Martin, "A 9–16 Gb/s Clock and Data Recovery Circuit with Three-State Phase Detector and Dual-Path Loop Architecture", IEEE ESSCIRC, Sept 2003, pp. 683–686.
12. H. Nosaka, E. Sano, K. Ishii, M. Ida, K. Kurishima, S. Yamahata, T. Shibata, H. Fukuyama, M. Yoneyama, T. Enoki, and M. Muraguchi, "A 39-to-45-Gbit/s multi-data-rate clock and data recovery circuit with a robust lock detector", IEEE JSSC, Vol. 39, No. 8, Aug 2004, pp. 1361–1365.
13. A. Pottbacher, U. Langmann, and H. Schreiber, "A Si Bipolar Phase and Frequency Detector IC for Clock Extraction up to 8 Gb/s", IEEE JSSC, Vol. 27, No. 12, Dec 1992, pp. 1747–1751.

# Jointly Optimize Equalizer and CDR for Multi-Gigabit/s SerDes

Song Wu and Robert Payne

**Abstract** High speed SerDes with channel inter-symbol interference (ISI) suffers eye closure in both vertical and horizontal directions. Equalization optimized to maximize the vertical eye opening as well as to minimize the horizontal jitter is discussed. In particular the adaptation loops incorporating equalization, CDR, and duty cycle distortion correction (DCD) are jointly optimized. Various circuits for improving the timing accuracy of transceivers and in particular phase interpolator based clock and data recovery (CDR) are detailed. Duty cycle distortion (DCD), quadrature mismatch, and phase interpolator accuracy is illustrated and circuit techniques to combat these problems is presented. Sample circuit designs for both small swing current mode logic (CML) and full-swing CMOS clock distribution are detailed, with an emphasis on differential to single-ended conversion (DSC).

## 1 Introduction

SerDes now becomes a fundamental IO device for diverse systems including FPGA, microprocessors, memory subsystems, hard disk drive, backplane signaling, and video transmission. For economical reasons the silicon products with SerDes as IO device need to support multiple standards at different data rates to address different market needs. The device from one vendor may have to interoperate the device from other vendors with different specifications.

CDRs have become fundamental building blocks of multi-Gbps SerDes. In addition, as data rate goes higher signal integrity becomes a challenge. The losses in the print circuit board (PCB) FR4 material makes the transmission line band-limited. This band-limited transmission introduces a huge amount of inter-symbol interference (ISI). In addition, via connections between different signal layers could present un-terminated stubs that cause reflections. Therefore, sophisticated equalization schemes are needed to compensate ISI and reflections present on the received signal. In order to achieve high aggregate throughput, the routing density on the

---

S. Wu (✉)  
Xilinx, Inc., Dallas TX, USA  
e-mail: song.wu@xilinx.com

PCB board is very high even for the high-speed lines. This causes the cross talk from the near end (NEXT) and the far end (FEXT) between adjacent lines to be the dominant noise source. Under certain conditions, the cross talk energy could exceed the signal energy near the Nyquist frequency. The combination of ISI, reflections and cross talk completely closes the eye at the receiver. While adaptive equalization can eliminate most of the ISI, the noise caused by reflections and cross talk will still be present after equalization and dictates the bit error rate (BER). One example of signal attenuation and cross talk from a practical backplane is shown in Fig. 1(A). The DFE, due to its nonlinear nature, offers the fundamental advantage of equalizing the channel without amplifying the noise. [1–4] Also practically, since it is a digital filter, the frequency response scales with clock frequency. The same hardware structure can be used at different data rates simply with clock scaling to meet different standard specifications.

The designs of CDR and DFE are inter-dependent. As shown in Fig. 1(B), the amount of post cursor ISI for DFE to cancel depends on the sampling phase. However, since the CDR works on the DFE conditioned signal, the amount of DFE applied to the signal also affecting the CDR locking phase.

Phase interpolation based CDR [5–7] has become a widespread technique not only because it proves to be a predictable, integration-friendly scheme for clock recovery, but also because it provides flexibility within a system to add value to multiple applications where precise or programmable phase control is required. The benefits of phase interpolator clock recovery systems are numerous. These include:

- Predictable, digitally controlled frequency tracking capability and CDR loop bandwidth
- Ability to build 1st order phase tracking systems that are inherently stable
- Minimization of integrated PLLs, thus avoiding the possible coupling between multiple CDRs and independent transmit PLLs
- Ability to generate precise phase offsets for measuring input data eyes or for in situ monitoring

Propagating interpolated clocks in the clock distribution network could cause distortion to the clock. Transceivers with half baud rate or even lower sub-rate clocks suffer from clock duty cycle distortion (DCD) and delay skews. In transmitters, this deterministic jitter (DJ) source distorts the data eye. In receivers, DCD results in off-center data sampling strobe placement and a corresponding increase in bit error rate (BER). For the interpolator based system the DCD could be different at different interpolator phases due to nonlinear distortion of the interpolator. The duty cycle correction (DCC) circuit has to follow the phase interpolator to correct the DCD for each interpolator phase. Consequently the duty cycle distortion correction loop resides inside the CDR and DFE loops. In addition to DCD, I/Q mismatch is another distortion in the clock distribution system. However, this problem can be resolved easily with a system level eye scan function.

As shown in Fig. 2, DCC circuit needs to correct the DCD for each CDR interpolator phase. The clean sampling clock from each interpolator output slices the



**Fig. 1** Legacy backplane

incoming signal providing data for the DFE to adapt. The updated DFE equalized waveform is then used to drive a new CDR interpolator phase. The system is iterative and eventually converges to a stable point. However to facilitate the convergence process, each component in the loop, i.e. DFE, DCC, CDR, and etc. ..., must be jointly optimized. The latency and the time constant of each sub-loop must be carefully partitioned to achieve global stability.



**Fig. 2** Coupled adaptive equalizer, eye scan, CDR and DCC loops

## 2 Phase Interpolation Based CDR

Phase interpolation based CDR has become a preferred technique for the generation of the data sampling clocks in a wide range of clock and data recovery systems. By nature, the digital phase interpolator used in these systems quantizes the ideal slicing location (strobe instant) of a CDR receiver much like a DAC quantizes a continuous voltage waveform using discrete voltage steps. Although many techniques exist for updating the placement of clock edges within a data eye, the ultimate timing accuracy is dependent on the quality of the phase interpolation.

Figure 3 illustrates the typical phase interpolator based CDR architecture. The PLL provides stable multiple phase clocks, typical 4 or 8 phases, locked to a local reference clock. Separate I/Q interpolators generate I-sampling phase at the crossing point and Q-sampling phase at the vertical eye open point. The early/late voting logic decides whether the sampling clock leads or lags the incoming data. The digital loop filter outputs separate I and Q interpolator codes to correct the sampling timing. The phase difference between the Q clock and I clock can be set to arbitrary values. In the simplest CDR circuits, the I and Q sampling clocks are nominally separated by  $90^\circ$ , with the assumption that this will provide a data sampling instant with the



**Fig. 3** Interpolator based CDR

lowest BER. Since I and Q can be programmed separately, the entire eye width can be scanned in one UI to either measure the eye timing margins or find the optimum data sampling position. This technique is referred to as eye scan.

The independent I/Q interpolators together with horizontal eye scan has the benefit of aligning the Q sampling clock optimally in the data eye in spite of any skews in the shape of the eye. This feature also removes the restriction of maintaining a quadrature relationship between the I and Q clocks, since the exact timing is adapted to the eye shape.

The tracking capability of an interpolator based CDR is usually limited by the latency of the digital update circuits as well as the interpolator step size. To minimize the power consumption and provide reasonable clock rates for the synthesis of the digital hardware, the digital early/late voting circuits typically operate on slower speed de-multiplexed data. The resulting latency is typically tens of UIs. Also to minimize hunting jitter, the interpolator step size needs to be small. Bang-bang phase detection is commonly used with phase interpolator based CDRs. Due to the nonlinear nature of a bang-bang phase detector, the interpolator based CDR loop typically uses a 1st order loop. However, a 2nd order loop implementation is also possible for certain applications with the constant frequency offset. For these cases, loop stability must be carefully considered in the design of the 2nd order accumulator.

The basic structure of phase interpolator cell is shown in Fig. 4(A), two pairs of complementary clock phases  $C_1$  and  $\bar{C}_1$ ;  $C_2$  and  $\bar{C}_2$  are mixed with adjustable weight currents  $I$  and  $I-I$  to output clock phase  $O$  and  $\bar{O}$ . The output clock  $O$  and  $\bar{O}$  can be continuously steered from phase  $C_1$  and  $\bar{C}_1$ ; to  $C_2$  and  $\bar{C}_2$  with varying the current  $I$  from 1 to 0. For this mixer type interpolator to work properly, the input waveforms  $C_1$  and  $\bar{C}_1$ ;  $C_2$  and  $\bar{C}_2$  and the output waveform  $O$  and  $\bar{O}$  need be sinusoidal. Therefore the interpolator needs to be bandwidth limited to filter out any nonlinear distortion. Also since the interpolator input  $C_1$  and  $\bar{C}_1$ ;  $C_2$  and  $\bar{C}_2$  are delivered from PLL as shown in Fig. 4(B) through multiple buffers and muxes, they



(A) Interpolator Mixer



(B) CML Clock Distribution



(C) PLL and VCO Delay Cell

**Fig. 4** Phase interpolator and clock distribution

could suffer from nonlinear distortion from each component in the path. Therefore, the same bandwidth limited filter needs be placed on each of these components to prevent from waveform distortion. It appears difficult to implement such filter, since the bandwidth has not only process, voltage and temperature dependency but also for circuit to operates at the different data rate the filter bandwidth has to be adjusted accordingly. The most appealing approach would be to build each of buffer, mux and interpolator with the replica of PLL delay cell in the VCO shown in Fig. 4(C). Since the delay cell in the VCO has to be biased properly to provide the precise phase delay at the data rate and PVT condition, it gives the exact filter function as needed.

### 3 Differential to Single-Ended Conversion and Duty Cycle Correction

In order to minimize power consumption and maximize the sensitivity of certain front-end circuits such as sense amplifiers, full swing CMOS clock signals are often required in transceivers. As a result, the small swing CML levels generated by phase interpolators must be converted to CMOS. Offsets accumulated in the clock distribution network and the CML to CMOS comparators generally causes significant duty cycle distortion. For half baud rate or lower sub-rate systems, the sampling uses both (or multiple) clock edges. A duty cycle distortion correction circuit is needed to remove the duty cycle errors. Since the digital early/late voting engine updates the interpolator at the byte rate and the duty cycle distortion introduced at each interpolator code could be different, the receiver DCC loop needs to settle within a few clock cycles so that at each interpolator code the sampling clock is free of duty cycle error.

Figure 5 illustrates the evolution of CML to CMOS conversion circuit. In Fig. 5(A), a simple comparator based design is illustrated. The duty cycle of the output inverter is dependent on its input DC bias. This DC bias level is dependent on the offset of the differential pair and is sensitive to process, voltage, and temperature variations. In addition, the path delay between rising and falling edges is different since there are two current mirrors in the pull down path and only a single mirror in the pull up path.

Figure 5(B) illustrates the addition of two current sources that can be used to both increase the speed of the comparator and provide a means of duty cycle correction. These current sources bias all transistors in a constant “On” state, which reduces the voltage excursions at nodes A, B, and C and hence relaxes any slew rate limitations. In addition, by detecting the duty cycle at the output of the comparator (for example, simply extracting the DC of the clocks), it is possible to adjust the strengths of these two current sources to correct any accumulated duty cycle errors.

Figure 5(C) further enhances the design with frequency dependent current mirrors that act as active inductive loads to the comparator. These provide significant gain to the voltage swings at nodes A, B, and C, which allow the input to the final inverter to toggle even faster with reduced currents. These also can reduce the



Fig. 5 Differential to single-end conversion circuits

latency of the duty cycle correction loop. The final design in Fig. 5(D) recognizes that in deep submicron processes with supply voltages in the 1 V range, the voltage swing at nodes B and C is comparable to the CML voltage swing. The comparator does not actually provide much gain, it instead sets up the DC bias levels for the output inverter. Since there is limited voltage gain, the circuit can be simplified to Fig. 5(D), where an inverter biased at its switching threshold with a large value resistor provides tight control of its output duty cycle and the input CML signal is capacitive coupled. Duty cycle correction can also be added to this circuit if required.

## 4 Decision Feedback Equalizer Design Considerations

Decision Feedback Equalization is required in many applications with a data rate higher than 6 Gbps, due not only to the severe attenuation but also strong cross talk present in the channel [1]. Figure 6(A) shows the architecture of a typical DFE. A feedback FIR filter filtering previous detected signal bits with adaptive tap weights tries to reproduce the post cursor ISIs and cancel them out from the incoming signals. Mathematically the process is represented by Eq. (1) for the  $m$ -th



**Fig. 6** Decision feed back equalizer. **(A)**: DFE architecture. **(B)**: DFE tap feed back timing relation

sample, assuming  $\varphi(t)$  is the symbol response and  $t_{\max}$  is the sampling phase at the eye open point.

$$\begin{aligned} z_m = a_m \cdot \varphi(t_{\max}) + \sum_{i=1}^I a_{m+i} \cdot \varphi(t_{\max} - iT) \\ + \sum_{k=0...} a_{m-1-k} \cdot \{\varphi(t_{\max} + T + kT) - dfe_{k+1}\} \end{aligned} \quad (1)$$

If the summation terms over  $k$  with  $k$  greater than zero disappear the post cursor ISI will be cancelled out. The benefit of this filter is that since the attenuation is low pass and cross talk is high pass in nature, the high pass DFE only selectively equalizes the signal without amplifying the cross talk. The result is an increase in the SNR.

At the eye transition point the filter can also help to reduce the jitter if the DFE feedback signal is properly inserted. The restriction is shown in Fig. 6(B) that if the incoming data pattern is a clock stream, the DFE feedback signal is also a clock pattern. To maintain phase synchronization, the DFE feedback needs to have the same zero crossing point so that the superposition of the two waveforms will not alter the phase transition time stamp. With this constraint the crossing point sampling can be expressed as

$$\begin{aligned} z_{m-1/2} = \sum_{k=0...} a_{m-1-k} \cdot \varphi(t_{\max} + T/2 + kT) + a_m \cdot \varphi(t_{\max} - T/2) \\ + \sum_{i=1}^I a_{m+i} \cdot \varphi(t_{\max} - T/2 - iT) - \sum_{k=0...} dfe_{k+1} \cdot (\hat{a}_{m-1-k} + \hat{a}_{m-2-k})/2 \end{aligned} \quad (2)$$

Considering when there is a transition  $a_m = -a_{m-1}$  Eq. (2) becomes

$$\begin{aligned} z_{m-1/2} = a_m \cdot \{\varphi(t_{\max} - T/2) - \varphi(t_{\max} + T/2) + dfe_1/2\} \\ + \sum_{k=1...} a_{m-1-k} \cdot \{\varphi(t_{\max} + T/2 + kT) - (dfe_k + dfe_{k+1})/2\} \\ + \sum_{i=1}^I a_{m+i} \cdot \varphi(t_{\max} - T/2 - iT) \end{aligned} \quad (3)$$

The DFE taps can be set either to maximize vertical eye open by maximizing the power of Eq. (1) or to maximize the horizontal eye open by minimizing Eq. (3). The maximum eye height criteria results in the tap values

$$dfe_{k+1} = \varphi(t_{\max} + T + kT) \quad (4)$$

i.e., post cursor ISI.

The maximum eye open criteria give the tap values

$$\begin{aligned} dfe_1 &= 2 \cdot \{\varphi(t_{\max} + T/2) - \varphi(t_{\max} - T/2)\} \\ dfe_k + dfe_{k+1} &= 2 \cdot \varphi(t_{\max} + T/2 + kT) \quad k > 1 \end{aligned} \quad (5)$$

The above horizontally optimized and vertically optimized adaptation schemes can both be implemented with the least mean square (LMS) algorithm. The horizontal eye width sensing can easily reuse the Early/Late logic already in the CDR circuit. For example as shown in Fig. 7(A) if two consecutive edges show Early, the CDR needs to delay the clock; (B) for two consecutive edges if the front edge is Early and the following edge is Late, on average the CDR does not update. However, for this particular the eye width is less than one UI and the channel is under equalized; (C) if two consecutive edges show Late, the CDR needs to advance the clock; (D) if the leading edge sample is Late and the trailing edge sample is Early, CDR does not need to change. However, the eye is wider than one UI and the channel is over equalized.

The vertical eye height sensing typically needs an auxiliary eye scan circuit as shown in Fig. 6(A). The auxiliary circuit creates a redundant parallel signal path that removes a certain amount of voltage margin from the signal until it fails. The vertical eye height is directly measured from the maximum voltage margin that can be taken out of the signal.

The timing for DFE feedback control can be accommodated with sense amplifier design. Most high-speed sense amplifier designs utilize a core sense amplifier that generates a pulse according to the input data polarity followed by a set-reset (SR)



**Fig. 7** Early/Late sensing for CDR and Equalizer

latch to capture the result. However, even in an optimized design, the added latch delay is still too great to meet the critical timing path.

To satisfy the speed and latch timing requirements of the DFE, the SR outputs of the sense amplifier are buffered by a pair of clocked inverters and parallel hold latches as shown in Fig. 8(A) and directly processed in the DFE. Hysteresis is minimized by pre-charging and shorting all internal differential nodes of the Fig. 8(A). This also reduces the impact of device mismatch on the input offset. During the precharge state, the clocked inverters isolate the sense amplifier from the output latches, minimizing hysteresis while using modest device sizes. The inverters also reduce the load seen by the core sense amplifier, provide the drive strength needed to charge and discharge the feedback capacitance, and distribute the gain to minimize the overall delay of the sense amplifier. The parallel latches hold the decision until it is no longer needed in the feedback loop and are reset using a combination of the sampling clock and DFECLK. Therefore using the DFECLK hold pulse the output of sense amplifier can be held as long as needed.

The gating of the DFE feed back timing can then be easily implemented with a half baud rate receiver architecture shown in Fig. 9. As in Fig. 9(A) two samplers clocked by two half rate clock phases *CLK90* and *CLK270* alternatively sample



(A) Sense Amplifier Circuit



(B) Clock Timing Relationship

**Fig. 8** Sense Amplifier



(A) Half baud DFE architecture.



(B) DFE feed back 2 to 1 mux and current DAC.

**Fig. 9** Half baud DFE implementation

the signal. The DFE tap mux controlled by the *DFECLK* ping-pong the half rate sampled signal back to *RXEQ* node to form a full rate feed back signal. This 2 to 1 mux function as shown in Fig. 9(B) can also be used to implement DFE tape weight with current DAC function.

## 5 Conclusion

The continuous improvement in CMOS process technology has enabled the integration of both low power CDRs as well as high-performance equalization. To achieve the optimal system performance, the design of each function not only on the signal path but also on the clock path has to be considered jointly. For half-rate receivers,

the stringent I/Q relation can be alleviated in the clock distribution by introducing a more appealing horizontal eye scan circuit. In this example the circuit design task is resolved with broader system level consideration. However, on the other hand the design of the duty cycle correction (DCC) circuits has to consider not only the DCD performance by itself but also limits from CDR loop stability. The trade off has to be made to balance good DCD performance which could be achieved with longer averaging time, with CDR tracking bandwidth which requires faster DCC convergence speed.

At the circuit level, the timing of the DFE feedback needs to be carefully calibrated so that when DFE feedback signal alternates the polarity, the transition point is exactly sampled by the I-sampler. This constraint on the DFE feedback timing reduces the perturbation to the sampling phase. At the system level, since the DFE taps need to adapt to the channel pulse response and the channel response also depends on the sampling phase in a sampled system, the DFE adaptation loop needs to be updated more slowly than CDR loop. At one steady CDR sampling phase, DFE adaptation removes ISI to have a lower jitter signal. The low jitter is also better for reduced wander in the CDR recovered phase location. By controlling DFE loop slow, while CDR is hunting for new phase, to the CDR loop the DFE taps and channel appear steady. All of the above considerations address the various loop dynamics shown in Fig. 2.

The design guidance discussed in this paper has been verified by Xilinx newly released 65 nm Virtex-5 FXT FPGA multi-gigabits GTX transceiver. GTX support 19 different serial data transmission standards, covering a range from 150 Mbps to 6.5 Gbps, while consuming less than 200 mW per channel at the highest speed. Four tap decision-feedback equalizer plus linear equalizer are used in the receiver. In lab the transceiver demonstrates error free operation over 65 inch FR4 board with 2 connectors and PRBS31 pattern at the highest data rate [8].

## References

1. Song Wu, et al., “Design of a 6.25-Gbps Backplane SerDes with Top-Down Methodology”, Design and test for multiple Gbps communication devices and systems, IEC 2005, p. 283.
2. R. Payne, et al., “A 6.25-Gb/s binary transceiver in 0.13  $\mu\text{m}$  CMOS for serial data transmission across high loss legacy backplane channels”. IEEE JSSC, Vol. 40, No. 12, December 2005, pp. 2646–2657.
3. J. L. Zerbe, et al., “Equalization and clock recovery for a 2.5–10 Gbps 2-PAM/4-PAM backplane transceiver cell,” IEEE J. Solid-State Circuits, vol. 38, no. 12, pp. 2121–2130. Dec. 2003.
4. V. Stojanovic, et al., “Adaptive equalization and data recovery in a dual-mode (PAM2/4) serial link transceiver,” in Proc. VLSI Circuit Symp., 2004.
5. J. Sonntag, et al., “A digital clock and data recovery architecture for multi-gigabit/s binary links,” IEEE J. Solid-State Circuits, vol. 41, no. 8, August 2006, p. 1867.
6. C. Kromer, et al., “A 25-Gb/s CDR in 90-nm CMOS for high-density interconnects,” IEEE J. Solid-State Circuits, vol. 41, no. 12, Dec. 2006, p. 2921.
7. A. Fiedler, et al., “A 1.0625 Gb/s transceiver with 2x-oversampling and transmit signal pre-emphasis,” ISSCC Digest of Technical Papers, pp. 238–239, Feb. 1997.
8. Virtex-5 FPGA RocketIO GTX Transceiver User Guide (UG198), Xilinx.

# Time to Digital Conversion: An Alternative View on Synchronization

J. Daniels, W. Dehaene and M. Steyaert

**Abstract** A new scheme for a fully-digital Clock and Data Recovery (CDR) circuit is proposed which combines immediate acquisition with a continuous frequency range. It uses a Time to Digital Converter (TDC) that measures the position on the time axis of the incoming data edges. This time information is passed to a decision block which extracts the clock and data information. A waveform generator is then used to reconstruct the original data along with the synchronized clock. A survey on TDCs is given followed by a comparison of the new CDR architecture with known topologies. A design example of a prototype TDC is presented with a wide operation range and adjustable time resolution.

## 1 Introduction

Time-to-Digital Converters have been reported for various applications [1]. Typical examples are pulsed time-of-flight laser radars used in traffic speed cameras, millimeter-precision object detection and localization, anti-collision radars and proximity sensors. These applications require mostly a very precise single-shot measurement within a high dynamic range.

With the downscaling of the minimal feature size of modern submicron CMOS technologies, TDCs are found very useful in other applications as well. This is the case when it is profitable to replace badly scaling analog circuits with TDCs. Since technology scaling implies voltage scaling while noise does not scale along, variability becomes more important. This requires more effort to be put into analog circuits which mostly leads to increased power consumption [2, 3]. Digital speed however does scale with technology. The Robustness of digital circuits is much more preserved though it is also compromised with variability. However, since time to digital converters directly profit from enhanced speed performance, switching from the analog to the (digital) time domain can significantly reduce the power

---

J. Daniels (✉)  
ESAT-MICAS, K.U. Leuven, Heverlee, Belgium

consumption for equal performance, especially for designs in sub-100 nm technology nodes. Applications that use TDCs in this context are Asynchronous Delta-Sigma A/D Converters [4] and digital DLL/PLLs for clock generation and data recovery [5–8].

This paper will concentrate on CDR circuits. More specific a new scheme is proposed for a fully-digital CDR circuit (see Fig. 1). It uses a TDC as a high precision time measurement device that analyses the incoming data waveform and controls a waveform generator which reconstructs the original data along with a synchronized clock. It will be shown that this approach combines immediate acquisition with a wide frequency range.

The paper will first give a survey on TDCs in general, after which the use of a TDC in CDRs is further described and compared with other CDR topologies. Finally a design example of a prototype TDC in 0.13  $\mu\text{m}$  CMOS technology is presented.



**Fig. 1** Fully-Digital Data Synchronization scheme

## 2 Survey on Time-to-Digital Converters

### 2.1 Principle and Terminology

The TDC's basic task is to measure the time information of one or more discrete-amplitude signals and provide a digital representation for further processing. The signal(s) to be measured must only have two amplitude levels (low and high) with clear defined transitions (edges) between these levels. The signals can either be continuous in time (start-stop signals, PWM signals, etc.) or discrete in time i.e. at a fixed rate (digital clock and data signals).

Depending on the application, the time information to be measured can either be the time difference between the rising edge of a start and a stop signal, the width of a pulse, or the location of the edges relative to a reference signal (see Fig. 2). The TDC can either be designed to conduct a single-shot measurement e.g. in the case of a start-stop signal, or measurements at a continuous rate e.g. in the case of clock synchronization.

Finally the application can require either a relative or an absolute time measurement.

The precision at which this information is measured is defined as the resolution of the TDC. The maximum value of the digital output determines its dynamic range. Table 1 gives an overview of a number of applications and the type of measurements it involves.

**Fig. 2** Sorts of time measurements: pulse width (1), time difference between a start and a stop signal (2), location of the rising (3a) and falling (3b) edge relative to a reference signal



**Table 1** Overview of some applications that apply time measurement

| Application                | Signal type     | Data rate   | Information   | Output                  |
|----------------------------|-----------------|-------------|---------------|-------------------------|
| Time-of-flight measurement | Continuous-time | Single-shot | Time interval | Absolute time           |
| PWM measuring              | Continuous-time | Continuous  | Edge location | Relative to data period |
| DPLL / CDR                 | Discrete-time   | Continuous  | Edge Location | Relative to ref clock   |

## 2.2 Basic Architectures

A straightforward method to measure a certain time interval is using a digital counter which counts the number of clock periods during the interval. The resolution of this method however is directly related to the clock frequency. Therefore, this technique can obtain at most a few hundreds of ps resolution in standard CMOS technology.

The first actual Time-to-Digital Converter circuits in literature were in fact Time-to-Amplitude Converters using e.g. a charge pump to convert the time information into an analog voltage, which could be digitized with an ADC. However, these analog TDCs suffer from technology scaling as pointed out before, which makes them less feasible. So most TDCs in literature use a more digital approach to measure the time information.

The most popular approach is the use of digital delay lines (see Fig. 3a). The input signal propagates through a chain of delay elements, of which the output node of each element is connected to a flip-flop. A reference signal is connected to the clock input of each flip-flop, sampling every delayed version of the input signal at the rising edge of the reference. The position of the signal transition relative to the reference is digitally represented by the output of the flip-flops in thermometer-code.



**Fig. 3** TDC with a digital delay line: **(a)** single-ended, **(b)** differential

A thermometer-to-binary encoder can be used to obtain a binary value of the measured time information.

With only digital delay elements and flip-flops, this circuit can be very compact and power-efficient. The minimal resolution for this circuit is the delay of two inverters, which scales along with the minimum feature size of CMOS technologies. Therefore this technique is very attractive for deep submicron technologies.

The resolution can be cut in half by using differential delay lines as shown in Fig. 3b. It uses two delay lines which are kept in opposite phase by cross-coupled inverter pairs. This method also equalizes the rise and fall times of the delay elements and reduces noise sensitivity.

The length of the delay line increases linear with the required range. For a given range  $T_{\text{range}}$  and a unit delay  $T_0$ , the length  $N$  of the delay line must be at least

$$N \geq \frac{T_{\text{range}}}{T_0} \quad (1)$$



**Fig. 4** TDC with a recycling delay line

This leads to high area and power consumption if a high dynamic range is required.

The length of the delay line can be reduced by using a recycling delay line [1] (Fig. 4) that couples the propagating signal back to the start of the delay line, so that only a fraction of the total length is needed. A fast counter keeps track of the number of cycles that the signal went through the delay line until the sampling moment.

### 2.3 Phase Locking and Digital Calibration

Since the output of the TDC is relative to the time resolution (e.g. the number of delay elements of which the total delay fits the time interval to be measured), this resolution must be known to obtain the correct result. Depending on what the application requires, the resolution must either be known as an absolute time value or relative to the period of the reference clock. The resolution is however dependent on process, voltage and temperature (PVT) variations. There are two strategies to cope with this issue.

An analog approach is to use a delay locked loop (DLL) to lock the total delay of the delay line to the period of the reference clock (see Fig. 5). This locking can be done using a phase detector (PD) to detect the phase difference between the input and the output of the delay line, followed by a charge pump (CP) to adjust a voltage which is used to vary the delay of the delay elements. This controlled voltage can e.g. be the supply voltage of the delay elements, a bias voltage for a current source to limit the current towards the delay elements or for a voltage controlled load capacitance (see Fig. 6). This approach however introduces analog circuits again to the digital delay line approach. This will lead to an increased area and power consumption.

Another approach is to use digital calibration by periodically measuring the reference clock relative to itself (e.g. during idle times), so that the resulting output of

**Fig. 5** Analog approach of locking the delay of a TDC using a DLL



**Fig. 6** Some tunable delay elements: (a) with variable load capacitance, (b) with variable supply voltage, (c) with variable supply current

the TDC is a measure of the clock period in terms of the time resolution. This value can either be used to tune the delay line in the same way as a DLL, or to rescale the output in the digital domain. Digital rescaling is the preferable strategy since it needs no tunable delay elements and the rescaling can easily be integrated into other post-processing tasks [9].

## 2.4 Linearity and Jitter

The precision of the TDC measurements depends not only on the time resolution, but is influenced by various error sources. Therefore a TDC's precision is usually expressed as the rms single-shot precision [1]. It is the total of all error contributions that occur at the measurement. If the separate rms values are known and under the assumption that the different errors are uncorrelated the rms single-shot precision can be calculated as

$$\sigma_{\text{rms}} = \sqrt{\sigma_q^2 + \sigma_{\text{in}}^2 + \sigma_{\text{ref}}^2 + \sigma_{\text{tdc}}^2 + \sigma_{\text{post}}^2} \quad (2)$$

where  $\sigma_q$  is the rms quantization error,  $\sigma_{\text{in}}$  and  $\sigma_{\text{ref}}$  the rms jitter of the input and reference signal,  $\sigma_{\text{tdc}}$  the rms jitter from the TDC and  $\sigma_{\text{post}}$  the rms error of the post-processing caused by finite precision. This implies that reducing the time resolution has no significant improvement when e.g. jitter generated by the delayline dominates the precision. It is therefore best to keep the rms single-shot precision within one LSB.

### A. Quantization noise

The measured result  $T_m$  of an edge located at time  $T$  can be expressed as

$$T = T_m + q = mT_0 + q \quad (3)$$

With  $q$  the quantization error uniformly distributed over  $[-T_0/2, T_0/2]$ . The rms value for  $q$  is  $T_0/\sqrt{12}$  and is therefore always smaller than one LSB.

### B. Integral nonlinearity

Another performance figure is the linearity of the TDC. Variation in the delay-element delays caused by device mismatch and noise, causes differential nonlinearity (DNL) in the measurement outputs. In the delay line, these DNL errors accumulate to integral nonlinearity (INL). Without delay locking or calibration, the INL error has its maximum at the end of the delay line. When locked or calibrated, the INL error at the end is forced to zero and the maximum error is now in the middle of the delay line. Its expected value is then equal to

$$\sigma_{\text{INL}(N/2)} = \frac{\sqrt{N}}{2} \sigma_{el} \quad (4)$$

with  $\sigma_{el}$  the standard deviation of the delay of one element and  $N$  the length of the delay line. Again this value should be less than one LSB. Reducing the length of the delay line does not only save area, but will also decrease the INL.

For this reason, the maximum expected INL error by using a recycling delay line is actually  $\sqrt{M}$  times smaller than if a classic delay line is used, with  $M$  the number of times the signal passes through the recycled delay line in one reference cycle [1]. Figure 7 shows e.g. that the INL for a recycled delay line with  $M = 4$  is a factor of 2 smaller compared to when a classic delay line was used.

### C. Random Noise

Another source that affects the measurement precision is random noise. Input and reference signals can have some jitter from the start, and thermal noise from the delay elements and supply noise add another amount of jitter to these signals. This jitter accumulates throughout the delay line, but every time a new clean edge arrives,



**Fig. 7** INL of a TDC: classic delay line compared with a recycled delay line ( $M=4$ )

it removes the accumulated jitter of the previous edge. Again these contributions should have an rms value of less than one LSB.

## 2.5 Subgate-Delay Architectures

A drawback of TDCs based on single or differential delay-lines with digital delay elements is that the resolution is limited to one or two inverter delays. Therefore other techniques are developed to increase the resolution beyond this limit. Figure 8 illustrates three possible techniques.

The Vernier delay line [10] (Fig. 8a) uses two delay chains, one for the input signal and one for the reference signal, but where the delay elements of the input chain have a slightly larger delay than those of the reference chain. The position in the Vernier delay line where the reference signal catches up with the input signal gives the time difference between the two signals. The resolution is no longer determined by the delay of a buffer, but by the offset  $T_0 = t_{d1} - t_{d2}$  between the two different delay elements, which can be made much smaller. The drawback of this method is that the latency is increased. While the number of delay elements is still according to (1), the total delay of the input delay line is now  $Nt_{d1} > NT_0$ .

Another method is the pulse-shrinking delay line [11] (Fig. 8b). It uses a single chain of delay elements that have a delay that is different for rising and falling edges. When a pulse is fed to the delay line, it shrinks after each delay element until it completely vanishes. Again the position where the pulse disappears gives information about the pulse width and the resolution is determined by the offset between the delays for the rising and falling edge. Again this method increases the latency of the TDC similar to the vernier delay line method.

A third technique is using passive time interpolation similar to voltage interpolation techniques in interpolating flash ADCs [9] (Fig. 8c). The delay interval of a single buffer in a delay line is interpolated using passive resistors. This effectively increases the resolution with the number of resistors without increasing the output latency. The precision however is limited by parasitic load capacitances on the output nodes between the resistors.



**Fig. 8** Subgate-delay architectures: (a) vernier delay line, (b) shrinking delay line, (c) passive delay interpolation

Finally the resolution can also be increased by using passive delay lines such as RC and LC lines [12], since their delay is not limited by the intrinsic delay of an inverter. These are however very prone to mismatch and require calibration at every delay cell separately.

## 2.6 Multi-Stage Architectures

Combining high resolutions with a high dynamic range can easily lead to large circuits. If e.g. 10 ps resolution is required over a 1  $\mu$ s range, minimum 100 k delay elements and latches are needed.

To reduce area and power consumption, one can combine a coarse measurement with a finer measurement using e.g. a delay line [13]. For the coarse measurement, a counter counts the number of clock periods that fit the time interval to be measured. The residue, i.e. the distance between the nearest clock edge and the input edge is then measured using clock interpolation with a delay line or a similar architecture. In theory, the dynamic range of this kind of circuits can be arbitrarily high without influencing the size of the interpolation stage. Calibration of each stage is mandatory however to match the measured outputs of the stages with each other.

In Fig. 9 e.g. a multi-stage TDC is shown with an interpolation ratio of 65 536 with respect to the reference clock, using only 20 delay elements [1]. It uses a digital counter, a coarse interpolation stage using a recycling delay line and a fine interpolation stage with two parallel delay lines.

**Fig. 9** Multi-stage TDC [1] with a digital counter (1), a coarse interpolation stage with a recycling DLL (2) and a fine interpolation stage with two parallel delay lines (3)



### 3 Fully Digital Clock and Data Recovery Using a TDC

#### 3.1 PLL-Based CDR

A classic analog approach for Clock and Data Recovery is depicted in Fig. 10a. It uses an analog PLL [14] to synchronize a clock with the data, and uses this clock signal to resample the data. The PLL uses a phase detector (PD) which detects the phase difference between the input data and the clock signal. The PD typically generates an *up* and a *down* pulse of which their lengths are proportional to the phase error. These pulses drive a charge pump (CP) which converts the phase error into a current, charging or discharging a capacitance. The charge pump also forms a low pass filter, which stabilizes the loop. The resulting control voltage drives a voltage-controlled oscillator (VCO) towards phase lock.



**Fig. 10** PLL-based CDR: (a) analog PLL, (b) PLL with a TDC as a digital phase detector, (c) All-Digital PLL

More digital approaches replace several or all of these blocks with digital versions. Example in Fig. 10b the PD is replaced by a TDC which outputs the phase error as a digital value. The charge pump and low-pass filter are then replaced by a digital filter. Then a D/A Converter converts the result back to the analog control voltage for the VCO.

A next step is even replacing the VCO with a digitally controlled ring oscillator (DCO), leading to an All-Digital PLL (ADPLL) [6–8] (Fig. 10c). The precision of the TDC does not have to be very high, since the TDC error is averaged out by the loop filter. However, higher precision can lead to faster lock times. Despite that, the acquisition time is relatively high because of the closed-loop configuration and PLL-based architectures are considered not suitable for applications that require very fast or even immediate acquisition time.

### 3.2 Burst-Mode CDR

PLL-based CDR circuits are closed-loop systems, reducing the phase error until a lock is reached between clock and input data. This leads to a long acquisition time,

especially if a wide frequency range is required. In contrast, open-loop systems can have a very fast lock time, and need only a few or even no preamble bits to lock the clock to the incoming data. Most implementations consist of a gated ring oscillator [15, 16] which is triggered by a data edge (see Fig. 11).

The resulting clock is therefore immediately locked with the data edge and can be used to sample the incoming data. The oscillation frequency is locked to the reference clock by using tunable delay elements. However, if this reference frequency deviates too much from the original data clock, recovery is problematic, especially with long runs of 1's or 0's. Therefore most of these burst-mode CDRs can only operate at a fixed frequency which must be known in advance. Multirate burst-mode CDRs [17] offer several operation frequencies by using a configurable frequency divider, but they still do not offer a continuous frequency range of operation.

Another technique for burst-mode CDR is using blind oversampling [18, 19] (see Fig. 12). The data is sampled at multiple moments within the clock period. With majority-voting or center-picking, the best sample is selected to be used for the data recovery. As with the gated oscillator, this approach can only work when the data rate is known in advance, since it provides no clock recovery.



**Fig. 11** Gated ring oscillator for burst-mode CDR



**Fig. 12** Oversampling CDR

### 3.3 Proposed Schematic for an All-Digital TDC-Based Burst-Mode CDR

Figure 13 presents a new approach for clock and data recovery. It uses a high-precision TDC that measures the location of the data transitions. These measurements are fed to a decision block which interprets the data and drives a generation block that simultaneously reconstructs the incoming data and generates

**Fig. 13** TDC-based burst-mode CDR



a synchronized clock signal along with the output data. Both the TDC and the generator work with multiple phases of a reference clock. Data recovery in this scheme is essentially finding the clock phase closest to the data transitions and using this information to extract a clock signal and to generate the synchronized output data. Since the TDC can instantaneously detect every data transition with high precision, immediate acquisition is possible. Moreover, since the operation of the TDC is independent on the data rate, this architecture can theoretically handle an infinite range of input frequencies up to the frequency of the reference clock. Hence, it combines the advantage of a flexible input frequency as in PLL-based CDRs, with burst-mode operation.

Figure 14 illustrates the operation of the TDC-based burst-mode CDR for 4 clock phases. For each data edge, the TDC outputs a value  $t$  representing the nearest clock phase. This information is then used to reconstruct the data and to generate the output clock. Notice that only two preamble bits (1-0) are sufficient to extract the



**Fig. 14** Illustration of data recovery with a TDC with  $N = 4$

clock signal and that this information is updated after every new data transition. Even without a preamble, the TDC results can be stored in a register until a 1-0-1 or a 0-1-0 data pattern occurs from which the clock signal can be calculated. After that, data reconstruction can be initiated using the stored data transitions.

The drawback of this method compared to other architectures is that it produces deterministic jitter dependent on the TDC resolution instead of stochastic jitter dependent on mismatch and noise. This deterministic jitter occurs every transition and accumulates when the data remains constant, putting a limit on the maximum length of successive 1's or 0's that can be correctly processed. E.g. if the data rate is 10 Gb/s and the TDC resolution is 10 ps, the maximum run length is 5, since in the worst case the accumulated error is 50 ps. The extracted clock is then half a period shifted with respect to the original data clock and data can no longer be recovered correctly. However, the precision of the TDC scales along with the technology, so that the deterministic jitter reduces with newer technology nodes.

Table 2 gives an overview of the characteristics of the proposed architecture compared with PLL-based and burst-mode CDRs.

**Table 2** Comparison of proposed scheme with PLL-based and burst-mode CDRs

|                        | PLL-based                                   | Burst-mode                                  | Proposed                                          |
|------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------------|
| <b>Lock time</b>       | Very slow                                   | Fast                                        | Fast                                              |
| <b>Frequency range</b> | limited                                     | Fixed data rate                             | Variable data rate                                |
| <b>Jitter</b>          | Stochastic,<br>depends on<br>noise/mismatch | Stochastic,<br>depends on<br>noise/mismatch | Deterministic,<br>depends on<br>TDC<br>resolution |
| <b>Scalability</b>     | Bad                                         | Good                                        | Good                                              |

## 4 TDC Implementation

A design example of a prototype TDC is designed in 0.13  $\mu\text{m}$  CMOS technology as presented in Fig. 13. It is designed for clock frequencies up to 500 MHz with a time resolution in the range between 50 and 150 ps. It can handle a continuous range of data rates up to the clock frequency. The TDC is designed for applying digital calibration, using the measured data to extract the reference period. This is purely a digital operation and is therefore not implemented on chip.

The block schematic of the TDC is depicted in Fig. 15. It consists of a differential delay line of 128 tunable delay elements. By using digital calibration, the total delay of the delay line does not have to be locked to the period of the reference clock. However the delay line must at least cover the entire period, with some extra margin to address delay variations caused by supply noise, temperature changes and process variations. This is done by tuning the time resolution.

The delay elements are followed by sample and hold (SnH) flipflops that sample the data at different phases of the reference clock. The samples are then



**Fig. 15** The TDC block

synchronized on the rising edge of the reference clock after which edge detectors are used to find the clock phase that is closest to the data edge. This information is eventually fed to a 128-to-7 binary encoder, which gives us the final output as a digital value.

Figure 16 shows one slice in detail. The delay elements consist of two inverters and a cross-coupled inverter-pair to keep the outputs in opposite phases. The delay elements are tunable with voltage controlled load capacitances (see Fig. 6a) such that the time resolution can be varied between 50 and 150 ps.

The data is then sampled with differential flipflops. Because the location of the data edges are unknown, setup-time violations of certain SnH flipflops are



**Fig. 16** A TDC slice in detail

inevitable, possibly resulting in a metastable output. To significantly reduce the chance of a metastable output in these cases, two consecutive flipflops are used. The second flipflop eliminates any metastability of the first of which the duration of the metastability is no longer than one clock period.

An example of a time measurement can be seen on Fig. 17. For illustration purposes, a TDC with  $N=16$  is used. Figure 17a shows the waveforms after the SnH flipflops. The clock phases nearest to the data edges can easily be found on this figure by detecting between the waveforms the 0-1 transitions for the rising data edges, and the 1-0 transitions for the falling edges. Notice also that since the total delay is longer than the period of the reference clock, two transitions are visible for the rising edge in this example. Only the first one is valid for the time measurement. The difference however between the two is actually a measure of the reference period in terms of the time resolution. This information can hence be used for the digital calibration to normalize the measurement results to the reference clock.

After sampling, the outputs of the SnH flipflops must be synchronized with the reference clock. Again this will lead to setup-time violations in some slices. If the setup-time violation occurs in the slice with the transition, this can lead to an uncertainty of one clock period in the measurement. In the worst case, the waveform in question is shifted with one clock period, which will be misinterpreted by the decision block that the original data edge was one clock period later. To solve this problem, the data is sampled at both the rising and falling clock edges, after which the valid sample is selected. The selection depends on the location of the data edges relative to the clock edge. If the data edge is between the rising and the falling clock edge, it is safe to sample on the rising edge without any risk on setup-time violation, and vice versa. On Fig. 16, this is determined by the selection signals *phase<sub>r</sub>*, for



**Fig. 17** TDC Waveforms: (a) output of the SnH flipflops, (b) output of the synchronization flipflops (bars) and the edge detector (numbers)

rising data edges and  $\text{phase}_f$  for falling data edges. After that, the samples are synchronized at the rising clock edge.

Finally the synchronized samples are fed to an edge detector which detects between the waveforms the 0-1 and 1-0 transitions that mark the nearest clock phase. Since there can be more than one transition, where the second transition is used for calibration, the binary encoder is modified to output two digital values,  $m_1$  and  $m_2$ .

Figure 17b shows for the same example the result of the synchronization step, and the output of the edge detector and encoder. From the measurement of the rising edge, the reference period can be derived to be equal to 11. This makes that the data pulse has a length of  $11 - 4 + 11 + 9 = 27$ .

#### 4.1 Performance and Test Chip

The TDC is implemented on chip in 130 nm standard CMOS technology operating at 1.5 V (Fig. 18). The core area of the chip is  $0.8 \text{ mm} \times 0.8 \text{ mm}$ . Table 3 gives an overview of the measured performance of the chip. Figure 19 shows the linearity of the time measurement for a operation frequency of 350 MHz with minimal time resolution. The plot is generated by applying a square wave at 1/8th of the clock frequency and by sweeping the skew between the input and the clock signal. From the measurements, the clock period relative to the resolution can be derived by taking the difference between  $m_1$  and  $m_2$ , which equals to 47 delay units. This corresponds to a time resolution of 61 ps. The DNL and INL is within  $+/- 1$  LSB over the whole measurement range. Figure 20 shows that the jitter of the delay line is equal to 4.2 ps, well within 1 LSB. A power consumption of 25 mW is measured.



**Fig. 18** Picture of the TDC chip layout. The core TDC circuit measures  $0.8 \text{ mm} \times 0.8 \text{ mm}$

**Table 3** Measured chip performance

|                      |                             |
|----------------------|-----------------------------|
| Operation frequency  | 350 MHz                     |
| Measured resolution  | 61 ps                       |
| RMS jitter delayline | 4.2 ps                      |
| Maximum INL          | $+/- 1$ LSB                 |
| Power consumption    | 25 mW                       |
| Silicon Area (core)  | $0.8 \times 0.8 \text{ mm}$ |



**Fig. 19** Sweep over the whole measurement range and the non-linearity of the TDC

## 5 Conclusion

In this paper, a new scheme for a fully-digital CDR circuit is proposed which combines immediate acquisition with a continuous frequency range. It uses a TDC that measures the time location of the incoming data edges. This time information is passed to a decision block which extracts the clock and data information. A waveform generator is then used to reconstruct the original data and a synchronized clock. Since the TDC can immediately and precisely measure every data edge, synchronization is obtained instantaneously. Also because there is no relation between the TDC reference clock and the incoming data rate, the frequency range is theoretically



**Fig. 20** Jitter plot of the output of the delay line

infinite up to the reference frequency. The precision of this CDR scales with technology and is therefore especially profitable in deep submicron technologies with increasing accuracy and decreasing power consumption compared with analog alternatives.

Also a design example of a prototype TDC in 0.13  $\mu$ m CMOS technology is presented. It operates at frequencies up to 500 MHz with a time resolution in the range between 50 and 150 ps. It can handle a continuous range of data rates up to the clock frequency. Measurements show 61 ps resolution with 4.2 ps jitter for a 350 MHz operation frequency, consuming 25 mW.

## References

1. Jansson, J.-P., Mantyniemi, A., Kostamovaara, J., "A CMOS time-to-digital converter with better than 10 ps single-shot precision," *Solid-State Circuits, IEEE Journal*, vol.41, no.6, pp. 1286–1296, June 2006.
2. Steyaert, M., Peluso, V., Bastos, J., Kinget, P., Sansen, W. "Custom analog low power design: The problem of low voltage and mismatch," *1997 IEEE Custom Integrated Circuits Conference*, pp. 285–292, May 1997.
3. Bult, K. "Analog design in deep sub-micron CMOS," *Proceedings of the 26th European Solid-State Circuits Conference*, September 2000.
4. Daniels, J., Dehaene, W., Steyaert M., Wiesbauer, A., "A/D Conversion Using an Asynchronous Delta-Sigma Modulator and a Time-to-Digital Converter," *International Symposium on Circuits and Systems*, May 2008.
5. Staszewski, R.B., Vemulapalli, S., Vallur, P.; Wallberg, J.; Balsara, P.T., "1.3 V 20 ps time-to-digital converter for frequency synthesis in 90-nm CMOS," *Circuits and Systems II: Express Briefs, IEEE Transactions*, vol.53, no.3, pp. 220–224, March 2006.
6. Staszewski, R.B., Wallberg, J.L., Rezeq, S., Chih-Ming H., Eliezer, O.E.. Vemulapalli, S.K., Fernando, C., Maggio, K., Staszewski, R., Barton, N., Meng-Chang L., Cruise, P.,

- Entezari, M., Muhammad, K., Leipold, D., "All-digital PLL and transmitter for mobile phones," *Solid-State Circuits, IEEE Journal*, vol. 40, no. 12, pp. 2469–2482, Dec. 2005.
7. Staszewski R.B., Balsara, P.T., "All-Digital PLL With Ultra Fast Settling," *Circuits and Systems II: Express Briefs, IEEE Transactions*, vol. 54, no. 2, pp. 181–185, Feb. 2007.
8. Kratyuk, V., Hanumolu, P.K., Moon, U.-K., Mayaram, K., "A Design Procedure for All-Digital Phase-Locked Loops Based on a Charge-Pump Phase-Locked-Loop Analogy," *Circuits and Systems II: Express Briefs, IEEE Transactions*, vol. 54, no. 3, pp. 247–251, March 2007.
9. Henzler, S., Koeppe, S., Lorenz, D., Kamp, W., Kuenemund, R., Schmitt-Landsiedel, D., "Variation tolerant high resolution and low latency time-to-digital converter," *33rd European Solid State Circuits Conference, 2007. ESSCIRC*, vol., no., pp. 194–197, 11–13 Sept. 2007.
10. Dudek, P. et al., "A high-resolution CMOS time-to-digital converter utilizing a vernier delay line," *Transactions on Solid-State Circuits*, vol. 35, no. 2, 2000.
11. Chen, P., Liu, S.-L., Wu, J., "A CMOS pulse-shrinking delay element for time interval measurement," *Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions*, vol. 47, no. 9, pp. 954–958, September 2000.
12. Mota, M., Christiansen, J., "A high-resolution time interpolator based on a delay locked loop and an RC delay line," *Solid-State Circuits, IEEE Journal*, vol. 34, no. 10, pp. 1360–1366, October 1999.
13. Nutt, R. "Digital time intervalometer," *The Review of Scientific Instruments* vol. 39, pp. 1342, 1968.
14. Gardner, F., "Charge-pump phase-lock loops," *IEEE Transaction Communication*, vol. COM-28, no. 11, pp. 1849–1858, November 1980.
15. Nakamura, M., Ishihara, N., Akazawa, Y., "A 156 Mbps CMOS Clock Recovery Circuit for Burst-mode Transmission," *Symp. VLSI Circuits*, pp. 122–123, June 1996.
16. Nogawa, M., Nishimura, K., Kimura, S., Yoshida, T., Kawamura, T., Togashi, M., Kumozaki, K., Ohtomo, Y., "A 10 Gb/s burst-mode CDR IC in 0.13/ $\mu$ m CMOS," *Solid-State Circuits Conference, 2005. Digest of Technical Papers. ISSCC. 2005 IEEE International*, vol., no., pp. 228–295 Vol. 1, 10-10 Feb. 2005.
17. Kobayashi, S., Hashimoto, M., "A multibitrate burst-mode CDR circuit with bit-rate discrimination function from 52 to 1244 Mb/s," *Photonics Technology Letters, IEEE*, vol. 13, no. 11, pp. 1221–1223, November 2001.
18. Yang, C.-K.K., Farjad-Rad, R., Horowitz, M.A., "A 0.5 –  $\mu$ m CMOS 4.0-Gbit/s serial link transceiver with data recovery using oversampling," *Solid-State Circuits, IEEE Journal*, vol. 33, no. 5, pp. 713–722, May 1998.
19. Ahmed, S.I., Kwasniewski, T.A., "Overview of oversampling clock and data recovery circuits," *Electrical and Computer Engineering, 2005. Canadian Conference*, vol., no., pp. 1876–1881, 1–4 May 2005.

## Part II

# High-Performance Amplifiers

The second chapter of this book is on high-performance amplifiers. Amplifiers come in different flavors. Here, several types of instrumentation amplifiers, basic IC building-block amplifiers, and audio power amplifiers are discussed, all with their special requirements and challenges.

The first paper, of Johan Huijsing, gives an overview on instrumentation amplifiers, where precise gain, low offset and high CMRR are challenges that should be fulfilled simultaneously, what asks for a special approach: the instrumentation amplifier. It describes the requirements on the basis of the applications, and addresses the developments in the field: aspects like the three-opamp configuration, indirect current feedback, auto-zeroing, and chopping, are discussed. This paper systematically builds up the techniques available now, and thus provides a thorough overview.

The second paper, of Wilko Kindt, addresses a specific sub class of instrumentation amplifiers: the current-sense amplifiers, where a current is sensed via a shunt resistor. More specifically, he addresses the very specific problem of a voltage common-mode range larger than the supply range, and even potentially above the technology specification, that can be encountered in current-sensing applications. On the other hand, high input impedances are not required in these applications. These properties distinguish these types of amplifiers from the instrumentation amplifiers that were discussed in the first paper. Several topologies, both in low-voltage and in high-voltage technology, are discussed.

The third paper, of Ramón González Carvajal et al., addresses high power efficiency of basic amplifier structures on IC at very low supply voltages, for applications which must be implemented in emerging technologies. Class AB systems are addressed because of their power efficiency, while still being fast, because not limited by SR. A special focus is on new design techniques that aim for current efficiency, ensuring both static and dynamic power consumption to be low. Super-class-AB amplifiers, with dynamic biasing, and Quasi-Floating-Gate amplifiers, with efficient implementations for the DC level shift, are introduced and analyzed.

The next two papers address the high-accuracy amplifier problem from the application area of very high sensitivity bio and nano-biosensors. Timothy Denison and Reid Harrison describe the challenges for the amplifiers in the readout circuits, for two application areas: detecting spike signals from individual neurons for a

neuroprosthesis, and detecting the field potential of large ensembles of neuron cells to measure bio-markers like a seizure. The challenges are a.o.: very weak signals, high spatial resolution in case of single-cell recording, signals drowned in 1/f and white noise in the same spectral domain, and severe power constraints, not just for power supply reasons, but also to prevent damage in the brain.

The fifth paper, of Marco Sampietro et al., describes an extremely sensitive current-sensing amplifier for the characterization of nano-biodevices. The currents to be sensed are in the order of picoamps at a bandwidth of 100kHz, the required resistances to sense these currents and to realize DC decoupling are in the order of several tens up to even several hundreds of gigaohms, and the capacitances are in the order of attofarads. This together makes the realization of such an amplifier a real and very specific challenge.

The last paper, of Marco Berkhout, addresses a complete other field for high-performance amplifiers: hifi audio, with class-D amplifiers. Distortion is a key issue here, but not the only one. In fact, the paper focuses on the aspects of the robustness and the prevention of audible artifacts, which consume most effort in the design of these amplifiers. Especially the artifacts arising from the switching, and the solutions found for this, will be highlighted here.

# Dynamic Offset Cancellation in Operational Amplifiers and Instrumentation Amplifiers

Johan H. Huijsing

**Abstract** This paper gives an overview of techniques that achieve low-offset, low-noise, and high accuracy in CMOS operational amplifiers (OA or OpAmp) and instrumentation amplifiers (IA or InstAmp). Auto-zero and chopper techniques are used apart and in combination with each other. Frequency-compensation techniques are shown that obtain straight roll-off amplitude characteristics in the multi-path architectures of chopper stabilized amplifiers. Therefore, these amplifiers can be used in standard feedback networks. Offset voltages lower than 1  $\mu$ V can be achieved.

## 1 Introduction

The combination of accurate voltage gain  $A_v$  and a low input offset voltage  $V_{os}$  and a high common-mode rejection ratio (CMRR) can not easily be implemented.

The closest type of amplifier that can have a low offset and high CMRR is the operational amplifier (OA). But this amplifier has a not well determined gain. The gain of an OA is normally so high, that feedback around the OA is needed to produce an accurate result [1]. This situation is depicted in Fig. 1.



**Fig. 1** Operational Amplifier in Feedback Network,  
 $V_{id} = 0$ ,  $I_{id} = 0$ ,  $I_{ic} (\text{CM}) = 0$   
(CMRR = High)

J.H. Huijsing (✉)  
Delft University of Technology, Delft, The Netherlands  
e-mail: j.h.huijsing@tudelft.nl

**Fig. 2** Instrumentation Amplifier,  $V_{id} \neq 0$ ,  $I_{id} = 0$ ,  
 $I_{ic}$  (CM)= 0,  $V_{od} = A_v V_{id}$   
CMRR = High



The feedback destroys the CMRR at the feedback-network input. Therefore, other ways have to be found to combine an accurate voltage gain, a low offset, and a high CMRR.

Instrumentation amplifiers (IA) can have the combination of accurate gain, low offset voltage  $V_{os}$ , and high common-mode rejection ratio. But they are more difficult to implement than operational amplifiers. A general symbol for an instrumentation amplifier is given in Fig. 2.

This paper discusses the following developments in the design of InstAmps:

1. Introduction.
2. Application of IA.
3. Three-OpAmp IA.
4. Current-Feedback IA.
5. Auto-Zeroing.
6. Chopping.
7. Chopper-Stabilization.
8. Chopping + AZ or Chopper-Stabilized.
9. Summery

## 2 Applications of Instrumentation Amplifiers

All applications of an IA use the combination of accurate gain and high CMRR. The first application example is a general one: to overcome a ground loop. This occurs when we want to transfer a voltage signal referred to a different ground potential  $V_{sRef}$  than that of the destination potential  $V_{oRef}$ . The situation is depicted in Fig. 3.

This is the case, for instance, when an instrument has to interface a sensor, like a thermocouple, that is connected to a remote ground. The small output voltage of the thermocouple requires a low offset voltage of the amplifier, while the remote

**Fig. 3** Instrumentation Amplifier bridging the common-mode voltage between  $V_{sRef}$  and  $V_{oRef}$



**Fig. 4** Instrumentation Amplifier for the readout of a sensor Bridge



ground can have a large potential difference in regard with the ground of the sensing instrument. This requires a high CMRR.

A second common application is the interfacing of the differential output voltage  $V_{Bd}$  of a sensor bridge that has a large common-mode voltage  $V_{BCM}$ , as shown in Fig. 4. Accuracy and low offset of the measurement in this application is of high priority.

A third application example is monitoring the voltage  $V_{Rsd}$  across a current-sense resistor  $R_s$  in supply lines of battery powered systems like cell phones and laptops. Power management and battery life makes this application rapidly more important.

A high dynamic range is required for the current-sense application, as we want to be able to measure high as well as low supply currents reasonably accurately, and do not want to spill a large amount of power across the sense resistor at high currents. This means that the IA or “current-sense” amplifier needs to have a low offset voltage under high CM input voltages. The CM input voltage range may even lay far above the supply voltage and, or, need to have a rail-to-rail span. This thoroughly complicates the design of the IA.

A final application example is sensing of differences in voltages of skin electrodes for measuring an ECG, EEG, or EMG of a person. These differential voltages are in the order of  $100 \mu\text{V}$  and  $1 \text{ mV}$  in the vicinity of large CM voltages from mains operated lamps and other sources on the order of  $10\text{--}100 \text{ V}$ . A high CMRR and patient safety are main requirements here.



**Fig. 5** Instrumentation Amplifier for interfacing a current-sense resistor

**Fig. 6** Instrumentation Amplifier for interfacing medical electrodes



### 3 Three-OpAmp Instrumentation Amplifiers

The most common approach to an IA is the three-OpAmp topology as shown in Fig. 7. See Fig. 3.2.2. in [2].



**Fig. 7** Three-OpAmp Instrumentation Amplifier with Resistor-Bridge feedback and input buffer amplifier

The actual IA consists of an OA that is feedback by a resistor bridge network \$R\_{11}\$, \$R\_{12}\$, \$R\_{13}\$ and \$R\_{14}\$. If the bridge is in balance, the gain for differential signals is:

$$A_d = -R_{12}/R_{11} \approx R_{14}/R_{13} \quad (1)$$

To achieve a high input impedance, buffer amplifiers OA<sub>2</sub> and OA<sub>3</sub> have been placed in front of the bridge resistors. These amplifiers are connected in a non-inverting gain configuration with \$R\_{21}\$, \$R\_{22}\$, and \$R\_{23}\$. Their extra gain is:

$$A_{d2} = (R_{21} + R_{22} + R_{23})/R_{21} \quad (2)$$

The total voltage gain is:

$$A_V = (R_{21} + R_{22} + R_{23})R_{12}/(R_{11} R_{21}) \quad (3)$$

The main problem of the three OpAmp approach is the CMRR. In this topology the CMRR is dependant on the matching of the feedback bridge resistors, as explained in [2]:

$$CMRR = (R/\Delta R)A_V \quad (4)$$

in which  $\Delta R/R$  is the relative error in one of the bridge resistors in regard to its ideal value if the bridge were balanced. For instance:

$$\Delta R_{11}/R_{11} = 1 - R_{14}/(R_{12} R_{13})R_{11} \quad (5)$$

Another shortcoming of the three-OpAmp approach is that the input CM range can not include the negative nor positive supply rail voltage. This is the consequence of the feedback connection from the output of the input buffer amplifiers OA<sub>2</sub> and OA<sub>3</sub> to their inputs. Only when a level shift is built-in in the positive input modes of these amplifiers one of the rail voltages can be reached.

## 4 Current-Feedback Instrumentation Amplifiers

The fundamentally best way to achieve a high CMRR is to convert the differential input signal  $V_{id}$  into a type of signal that is insensitive to the CM voltage  $V_{iCM}$ . Such a signal could be a magnetic signal in a transformer, or a light signal between a light-emitting and light-sensing diode. But when we stay closer to the electrical domain, also an electrical current signal could be used, if we can make it sufficiently insensitive for the CM voltage. For a circuit on a chip the last method is preferable. Therefore the differential input voltage  $V_{id}$  is converted into a current and compared with the current from the conversion of the feedback part  $V_{fb}$  of the output voltage  $V_o$  [3]. The situation is shown in Fig. 8.

The first voltage-to-current converter  $G_{m21}$  converts the differential input voltage  $V_{id}$  into a first current. The second converter  $G_{m22}$  converts the feedback output signal  $V_{fb}$  into a second current. Both currents are subtracted and compared by a control amplifier  $G_{m1}$  that drives the output voltage. A resistor divider  $R_2, R_1$  determines the part  $V_{fb}$  of the output voltage  $V_o$  that is fed back. The gain of the whole amplifier will be:

$$A_V = (G_{m21}/G_{m22})(R_2 + R_1)/R_1 \quad (6)$$



**Fig. 8** Current-Feedback Instrumentation Amplifier

Often we can not easily make the transfer of  $G_{m21}$  and  $G_{m22}$  accurately different. But we can make  $G_{m21}$  and  $G_{m22}$  accurately equal. In that case the gain of the amplifier simplifies to:

$$A_V = (R_2 + R_1)/R_1, \text{ while } G_{m21} = G_{m22} \quad (7)$$



**Fig. 9** Simple Circuit-Diagram of a Current-Feedback Instrumentation Amplifier

**Fig. 10** Symbol for a Current-Feedback Instrumentation Amplifier



The CMRR is now not determined by matching of main elements but just by the ratio of the  $G_m$  and small parasitic conductances, which keep the CMRR large.

The InstAmp is Miller compensated by the capacitors  $C_{M11}$  and  $C_{M12}$ .

A simple example of a current-feedback InstAmp is given in Fig. 9.

The input and feedback VI converters are as simple as possible. They can be degenerated to increase the differential input voltage range if needed. Their linearity is not good in itself, but they match quite well for gain accuracy. The input CM voltage range may include the negative supply-rail voltage  $V_{SN}$ . This allows the output voltage  $V_o$  being referenced to  $V_{SN}$ . The input stages are followed by folded cascodes with a current mirror at their upper end. The push-pull output transistors are biased in class-AB by a class-AB mesh composed from  $M_{39}$  and  $M_{40}$  and proper bias voltages  $V_{B5}$  and  $V_{B6}$ . See Fig. 9.4.6 in [4].

A general symbol for a current-feedback IA is given in Fig. 10. It shows that inside the IA there are two  $G_m$  stages: one for the input  $G_{mi}$  and one for the feedback  $G_{mfb}$ .

It is interesting that the output as well as the input has a high CMRR. This means that we can connect the output reference voltage  $V_{oRef}$  terminal to any voltage as shown in Fig. 11. The voltage across the measuring resistor  $R_M$  and the current



**Fig. 11** Universal Voltage-to-Current Converter with a Current-Feedback Instrumentation Amplifier

through  $R_M$  are not influenced by the voltage on  $V_{oRef}$ . Hence, we obtain a voltage controlled current source at the  $V_{oRef}$  terminal. The whole topology of Fig. 11 act as an accurate general-purpose V-I converter with a transconductance of  $1/R_M$ . Hence  $I_o = V_{id}/R_M$ .

## 5 Auto-Zero OpAmps and InstAmps

In “**Clock Recovery and Equalization Techniques for Lossy Channels in Multi Gb/s Serial Links**” we have seen several applications that need low offset. Auto-zeroing and chopping are the main tools to obtain low offset.

In this chapter we start with auto-zeroing. Firstly, we will apply auto-zeroing to an OA in order to reduce its offset. Out of the many ways to implement auto-zeroing we firstly have chosen the simple method with switched capacitors at the input as shown in Fig. 12.

The auto-zero OA consists of an auto-zeroing input stage  $G_{m2}$  with input CM control and a Miller compensated output stage  $G_{m1}$ .

Auto-zeroing has two phases. In phase 1 the forward path is broken, and  $G_{m2}$  is being fully fed back, so that its offset appears at its input. The auto-zero capacitors  $C_{AZ21}$  and  $C_{AZ22}$  store this offset voltage as their inputs are short-circuited together. In phase 2  $G_{m2}$  is connected straight forward, and the auto-zero capacitors are connected to the input. Their stored offset voltage now compensates for the offset of  $G_{m2}$ . Therefore  $G_{m2}$  shows no offset in phase 2.

An improved auto-zero topology with capacitors at output is shown in Fig. 13.

When the input switches  $S_{21}$  and  $S_{22}$  are short-circuited, and the auto-zero switches  $S_{23}$  and  $S_{24}$  are in auto-zero position, the output current of  $G_{m2}$  charges the capacitors  $C_{31}$  and  $C_{32}$  at its output until the correction amplifier  $G_{m3}$  compensates this current. The output of  $G_{m2}$  is CM controlled at its output.

The advantage of this topology is that the capacitors can store a larger voltage, than in the preceding case. This means for instance that the capacitors and  $G_{m3}$  can be taken  $10\times$  smaller for the same  $kT/C$  noise and charge injection errors. The offset of  $G_{m3}$  is not of interest because it is automatically taken into account in the capacitive stored voltage.



**Fig. 12** Switched-Cap Auto-Zero OpAmp.  $V_{os} = 100 \mu V$



**Fig. 13** Auto-Zero OpAmp with storage capacitors  $C_{31}$  and  $C_{32}$  at the output and correction amplifier  $G_{m3}$ .  $V_{os} = \sim 20\mu\text{V}$

Very important is that the auto-zero action removes offset and 1/f noise. But, extra noise  $V_{naz}$  is generated in the frequency range below  $2f_{AZ}$  due to noise fold back from the bandwidth BW of the local auto-zero feedback loop. This is depicted in Fig. 14.

$$V_{naz} = V_n(\text{white}) \text{BW}^{1/2} / f_{az}^{1/2} \quad (8)$$

A problem is that the auto-zero OA has no continuous-time transfer. This means that when the output has to follow a ramp, a staircase with steps at the clock frequency is the result. Moreover, a factor  $2^{1/2}$  must be added to the noise as the amplifier is only used half of the time effectively. To overcome these problems the Ping-Pong auto-zero [5] concept of Fig. 15 has been invented.

In Fig. 15 two auto-zero input stages  $G_{m21}$  and  $G_{m22}$  alternately are connected between the input and the output stage in order to obtain a continuous-time solution. The stage that is not connected gets time to auto-zero itself. This allows the OA to be generally used in continuous-time feedback configurations.

We can extend the principle of ping-pong to ping-pong-pang in order to obtain a suitable InstAmp topology, as shown in Fig. 16.

In Fig. 16 three auto-zero input stages  $G_{21}$ ,  $G_{22}$  and  $G_{23}$  are used. Sequentially, two stages are connected to the output stage  $G_{m1}$ , while one stage is in auto-zero mode. In this way a continuous-time IA is shaped while its offset and 1/f noise is strongly reduced by auto-zeroing.



**Fig. 14** Noise densities with and without auto-zeroing



**Fig. 15** Ping-Pong Auto-Zero OpAmp.  $V_{os} = \sim 100 \mu V$



**Fig. 16** Ping-Pong-Pang Auto-Zero InstAmp.  $V_{os} = 100 \mu V$

The limitation of offset reduction is due to parasitic capacitors of capacitors and switches. When the input switches change from auto-zero mode to transfer mode and vice versa, parasitic capacitors to ground are charged and discharged. Any unbalance in this charge will change the offset voltage stored on the AZ capacitors. Offline auto-zero as in Fig. 13 would therefore be preferable.

In practice the offset can maximally be reduced by a factor on the order of 100 or 500 with auto-zeroing, reducing the offset from 10 mV to 100 or 20  $\mu V$ .

It is very interesting to see that not only the offset voltage is reduced by the AZ function, but any differential input voltage at frequencies lower than the AZ frequency, if the gain is sufficiently large. This means that also the CMRR is drastically increased.

## 6 Chopper OpAmps and InstAmps

Before we discuss the chopper IA we will look at the chopper OA [6]. This OA one is depicted in Fig. 17. We suppose a  $6\sigma$  input offset of 10 mV for  $G_{m2}$ .

The choppers  $Ch_2$  and  $Ch_1$  alternatively turn the signals through the input stage  $G_{m2}$  straight and reverse. This means that the input voltage  $V_{id}$  will appear as a continuous-time current at the output. But the input offset voltage  $V_{os2}$  appears as a square wave current, superimposed in the output, as shown in Fig. 18.

If the OA is placed in a feedback application, the input voltage will show the residual offset voltage with a low-pass filtered square wave ripple on top of it.

In the noise spectra of the offset and 1/f noise are now shifted to the clock frequency  $f_{cl}$  as noise and ripple, as shown in Fig. 19.



**Fig. 17** Chopper OpAmp with cont-time transfer.  $V_{os} = \sim 10 \mu V$ ,  $V_{rip} = \sim 10 \text{ mV}$



**Fig. 18** Voltage and Current Signals as function of time in a Chopper Amplifier

**Fig. 19** Noise densities in an Amplifier with and without chopping



The resulting offset has mainly two origins: Firstly, clock skew in the chopper clocks. If the offset is 10 mV and the clock skew is  $10^{-4}$ , the resulting offset is 1  $\mu$ V. Secondly, the resulting offset is a result of imbalance of parasitic capacitors in the choppers. The parasitic capacitors are shown in Fig. 20.

Suppose that chopper  $Ch_1$  (in between the input- and output stage) has only the capacitors  $C_{p11}$  and  $C_{p12}$  around transistor  $M_1$ . The capacitor  $C_{p12}$  produces alternative positive and negative current spikes at the output of the chopper  $Ch_1$ . This does not contribute to the offset. However, capacitor  $C_{p11}$  also produces alternative spike currents at the input of chopper  $Ch_1$ . When going to the output, these alternative spike currents are being rectified by the function of the chopper  $Ch_1$ .

The rectified spikes represent an average DC current, which is an offset. Fortunately, the chopper is fully balanced. Hence, charge injection from the clock in one transistor cancels that of the other. But every imbalance in layout will cause a net offset. For chopper  $Ch_2$  at the input, the capacitor  $C_{p22}$  injects alternating current spikes on the clock edges. These spikes are translated in rectified input voltage spikes across the series impedances of the chopper and the input signal source. Also these rectified spikes go to the output as a net offset. Practical offset voltage to below 1  $\mu$ V can be obtained if the choppers, their clock lines, and the signal lines are carefully balanced in the layout. A common practice is to layout the clock lines as coaxial cables on the chip.

In our quest for low offset, noise and ripple we see two contradictory effects. On one hand, the higher the clock frequency, the smaller the ripple at the output and the lower the 1/f noise residue. On the other hand we see a higher residual offset caused by clock skew and charge injection at higher clock frequencies. This contradiction can be relieved by using two choppers in series for each original chopper in a nested chopper configuration [7] according to Fig. 21.



**Fig. 20** Charge injection current in  $C_{p11}$  of Chopper  $Ch_1$  gets rectified at output



**Fig. 21** Nested-Chopper Operational Amplifier with better compromise between 1/f noise, ripple, and offset.  $V_{os} = \sim 0.1 \mu\text{V}$ ,  $V_{rip} = \sim 100 \mu\text{V}$

The inner choppers  $\text{Ch}_{211}$  and  $\text{Ch}_{11}$  can be clocked at a 100 times higher frequency  $\text{Cl}_H$  to overcome 1/f noise and ripple, while the outer choppers  $C_{221}$  and  $C_{12}$  are clocked at a 10 times lower frequency  $\text{Cl}_L$  to take away the residual offset by the charge injection of the inner choppers. This architecture can lead to offset voltages as low as on the order of  $0.1 \mu\text{V}$ . But a small  $\sim 100 \mu\text{V}$  filtered input-referred ripple at  $\text{Cl}_H$  still remains due to the original offset, and an even smaller ripple at  $\text{Cl}_L$  due to charge injection of  $\text{Ch}_{11}$ .

An other way to reduce the ripple is to combine an auto-zeroed amplifier in a ping-pong fashion with a chopper amplifier in order to obtain a low-ripple continuous-time signal transfer [8]. The block diagram is shown in Fig. 22.

The choppers  $\text{Ch}_1$  and  $\text{Ch}_2$  chop the signal alternately positive and negative through the whole set of two ping-pong auto-zeroing amplifiers  $G_{m21}$  and  $G_{m22}$ . The switches  $S_{211}$  through  $S_{222}$  and  $S_{213}$  through  $S_{224}$  sequentially switch the amplifiers  $G_{m21}$  and  $G_{m22}$  in a transfer or auto-zero mode in a full clock cycle.

The capacitors  $C_{311}$  through  $C_{322}$  differentially store the auto-zero correction voltages. The transconductances  $G_{m31}$  and  $G_{m32}$  correct the amplifiers  $G_{m21}$  and  $G_{m22}$ .



**Fig. 22** Operational Chopper Amplifier with Ping-Pong auto-zero input stages.  $V_{os} = \sim 3 \mu\text{V}$ ,  $V_{rip} = \sim 10 \mu\text{V}$

**Fig. 23** Noise in an Operational Chopper Amplifier with Ping-Pong auto-zero input stages



$G_{m22}$  for their offsets, respectively. The auto-zero switches  $S_{213}$  through  $S_{224}$  switch the outputs of  $G_{m21}$  and  $G_{m22}$  between the stored voltages on the auto-zero capacitors and the input offset voltage of the output stage. This causes some extra charge injection. The amplifier achieves an offset of  $3 \mu\text{V}$  and an input referred ripple on the order of  $10 \mu\text{V}$ . The noise of the auto-zero amplifier is now transposed by the choppers to the clock frequency, which keeps the low frequencies cleaner, as shown in Fig. 23.

An advantage of the ping-pong continuous-time topology is the simplicity of the frequency compensation. It is restricted to one set of Miller-compensation capacitors.

A chopper instrumentation amplifier can be constructed if we use two input stages  $G_{m21}$  and  $G_{m22}$ , each preceded by a chopper,  $Ch_{21}$  and  $Ch_{22}$ , respectively. This situation is shown in Fig. 24.

$$\text{The gain is: } A_v = ((R_1 + R_2)/R_1)(G_{m21}/G_{m22}) \quad (9)$$

The accuracy of the instrumentation amplifier fully depends on the equality of  $G_{21}$  and  $G_{22}$ . In “**Low-Voltage Power-Efficient Amplifiers for Emerging Applications**” we will discuss ways to increase the accuracy of  $G_m$  stages. Even



**Fig. 24** Chopper Instrumentation Amplifier.  $V_{os} = \sim 20 \mu\text{V}$ ,  $V_{rip} = \sim 20 \text{ mV}$



**Fig. 25** Nested Chopper Instrumentation Amplifier with better compromise between 1/f noise, ripple, and offset.  $V_{os} = \sim 0.2 \mu\text{V}$ ,  $V_{rip} = \sim 200 \mu\text{V}$

with an ordinary differential pair in weak inversion, and well matched tail currents, an accuracy better than 1% can easily be achieved without trimming.

The CMRR is also strongly increased by the chopper function for frequencies below the clock frequency. Easily 60 dB can be added to the CMRR by chopping. The improvement is limited, firstly, by the clock skew in the chopper clocks, and secondly, by unequal modulation of the charge injection spikes in the choppers as a function of the CM voltage. The resulting offset can be as low as  $20 \mu\text{V}$ , which is twice that of the chopper OpAmp, and an input-referred ripple of  $20 \text{ mV}$ , which is twice of that of the OpAmp's. The factor 2 is an estimation, and results from the fact that there are two parallel input stages, while each has more offset due to degeneration.

To improve the offset and ripple, we may also apply the nested-chopper [7] principle to the chopper instrumentation amplifier, as shown in Fig. 25. By this a better compromise of chopper ripple and 1/f noise on one hand and residual offset on the other hand can be achieved as explained with Fig. 21. An offset on the order of  $0.2 \mu\text{V}$  can be achieved and a residual ripple on the order of  $200 \mu\text{V}$ .

## 7 Chopper-Stabilized OpAmps and InstAmps

The output ripple from a chopper amplifier invites us to search for ways to reduce it. The chopper-stabilized amplifier is one of the best approaches [9]. A basic chopper-stabilized OA topology is shown in Fig. 26.

The basic OA is composed of two stages  $G_{m1}$  and  $G_{m2}$ . The output stage  $G_{m1}$  is Miller compensated by  $C_{M11}$  and  $C_{M12}$ . The input stage  $G_{m2}$  forms the 'high-frequency' path. The CM level at the output of  $G_{m2}$  is controlled at  $V_{CMo2}$ .

The input stage  $G_{m2}$  offset  $V_{os2}$  is taken into account. When the OA is placed in a feedback loop, the offset  $V_{os2}$  appears at the input. This input error voltage  $V_{id}$  is now measured and corrected by the chopper amplifier's 'gain' path. This path starts



**Fig. 26** Chopper-Stabilized Operational Amplifier with multipath hybrid-nested Miller compensation.  $V_{os} = \sim 10 \mu\text{V}$ ,  $V_{rip} = \sim 100 \mu\text{V}$

with an input chopper  $\text{Ch}_2$  that translates the input error voltage  $V_{id}$  into a square wave. The sense amplifier  $G_{m5}$  produces a square-wave output current proportional to  $V_{id}$  together with a DC output current due to its own DC offset  $V_{os5}$ . The chopper  $\text{Ch}_1$  chops the square-wave current back to a DC error current, while the DC offset current is changed into a square-wave current. The square-wave current due to offset of  $G_{m5}$  is filtered out by the integrator  $G_{m4}$ . While the DC current as a function of the input error voltage  $V_{id}$  is integrated and strongly amplified by the DC gain of the integrator  $G_{m4}$ . Finally the integrated error voltage is added through  $G_{m3}$  to the output current of the input amplifier  $G_{m2}$ . It should be noted that the output CM levels of  $G_{m5}$  and  $G_{m4}$  have to be controlled to their CM levels  $V_{CMo5}$  and  $V_{CMo4}$ , respectively.

We have now obtained a two path amplifier: a high frequency low gain path through  $G_{m2}$ , and a low-frequency high gain path through  $G_{m5}$ ,  $G_{m4}$ , and  $G_{m3}$ . The offset can only be reduced to the extent that the high-gain path has a higher gain than the low-gain path.

One of the old struggles with chopper-stabilization is that the two poles in the gain path lead to a non-straight 6 dB per octave roll-off, as shown in Fig. 27.

This problem can be solved in practice by applying the principle of hybrid nesting as described in [10]. To that end we connect two hybrid-nested Miller capacitors  $C_{M31}$  and  $C_{M32}$  from the final output to the input of the integrator  $G_{m4}$ .

If we choose the bandwidth of the two-stage Miller-compensated HF amplifier path equal to the bandwidth of the four-stage hybrid-nested Miller loop, the overall frequency characteristic becomes straight form very low frequencies to the bandwidth of the OA. Therefore we choose  $G_{m2}/(C_{M11} \text{ in series with } C_{M12}) = G_{m5}/(C_{M31} \text{ in series with } C_{M32})$ . The result is a straight frequency characteristic, as shown in Fig. 27.

The low- frequency behavior, and thus the offset of the whole amplifier is determined by that of the chopper loop. That means that we have to carefully balance the

**Fig. 27** Amplitude Characteristic of a Chopper-Stabilized amplifier with and without hybrid-nested Miller capacitors  $C_{M31}$  and  $C_{M32}$



parasitic capacitors  $C_{p11}$  and  $C_{p22}$  of the choppers  $Ch_1$  and  $Ch_2$  and their lay-out. Also the clock skew of the chopper clocks determine the offset. If the clock skew is  $10^{-4}$ , and the  $6\sigma$  offset of the chopper amplifier is 10 mV, an offset of 1  $\mu$ V is resulting.

There is one more source of offset we have to watch for. That is caused by a combination of the parasitic capacitor  $C_{p5}$  between the outputs of  $G_{m5}$  and the offset  $V_{os4}$  of the integrator amplifier. The chopper  $Ch_3$  chops this offset voltage back and forth on  $C_{p5}$ , while it rectifies its current spikes into a DC value  $I_{p5}$  at the input of the integrator equal to:

$$I_{p5} = 4 V_{os4} C_{p5} f_{cl} \quad (10)$$

This current cannot be distinguished anymore from the DC output current of the chopper sense amplifier that is also presented at the input of the integrator. The resulting input offset  $V_{osi}$  is:

$$V_{osi} = I_{p5}/G_{m5} = 4 V_{os4} C_{p5} f_{cl}/G_{m5} \quad (11)$$

The resulting offset is smaller than 1  $\mu$ V referred to at the input, only if we take measures to make  $C_{p5}$  small, i.e. in the order of 0.1 pF. We can always chopper-stabilize the integrator amplifier to further reduce this offset component.

The input referred ripple has now been reduced by a factor 100 from a square wave of about 10 mV in the chopper amplifier into a triangle wave of about 50  $\mu$ V in the chopper-stabilized amplifier. If we want to decrease the ripple further, we can auto-zero the chopper amplifier [11], as shown in Fig. 28.

We have now a combination of a chopper-stabilized amplifier in which the chopper amplifier is auto-zeroed. In this way the ripple can further be reduced to the 1 micro volt level. The noise spectrum of such an amplifier is shown in Fig. 29.

An interesting alternative way to reduce the ripple is using a sample-and-hold after the integration [12], as shown in Fig. 30.  $V_{os} = \sim 3 \mu$ V,  $V_{rip} = \sim 20 \mu$ V.



**Fig. 28** Chopper-Stab. OpAmp with auto-zero  $G_{m5}$ .  $V_{os} = \sim 1 \mu V$ ,  $V_{rip} = \sim 1 \mu V$

**Fig. 29** Noise densities of a chopper-stabilized multi-path Instrumentation Amplifier with and without auto-zeroing



In this design two passive integrators have been connected as a ping-pong sample and hold with  $C_{41}$ ,  $C_{42}$ , and  $C_H$ . The design is simple and elegant and has an offset of  $3 \mu V$ , while the ripple is on the order of  $20 \mu V$ .

Now, the step has to be made to an instrumentation amplifier. Therefore the chopper-stabilized OA UPmust be transformed into the current-feedback IA architecture [13]. The circuit is shown in Fig. 31.



**Fig. 30** Chopper-Stabilized OpAmp with passive integrator and sample & hold



**Fig. 31** Chopper-Stabilized InstAmp with UPmultipath hybrid-nested Miller comp.  $V_{os} = \sim 20 \mu\text{V}$ ,  $V_{rip} = \sim 200 \mu\text{V}$

The IA has a HF path through  $G_{m21}$  and  $G_{m22}$  and a LF gain path through  $G_{m51}$  and  $G_{m52}$ . The LF gain path not only determines the offset and CMRR, but also sets the gain accuracy at low frequencies.

$$\text{The gain at low frequencies is: } A_{VL} = (G_{m51}/G_{m52})((R_1 + R_2)/R_1), \quad (12)$$

and at high frequencies:

$$A_{VH} = (G_{m21}/G_{m22})(R_1/(R_1 + R_2)) \quad (13)$$

An offset in the order of  $20 \mu\text{V}$  and a ripple of out  $200 \mu\text{V}$  can be obtained. The offset and ripple is a factor  $2^{1/2}$  larger than in the OA case because we have two input stages in parallel in both the HF and LF gain path. Also, also the noise is  $2^{1/2}$  times larger than in the OA case.

If we want to further reduce offset and ripple the chopper amplifiers can be auto-zeroed as in the OA case [13]. The resulting block diagram is shown in Fig. 32.

This topology may result in an input referred offset voltage lower than  $2 \mu\text{V}$  and a ripple lower than  $2 \mu\text{V}$ .



Fig. 32 Chopper-Stabilized InstAmp with auto-zero sense amplifiers.  $V_{os} = \sim 2 \mu\text{V}$ ,  $V_{rip} = \sim 2 \mu\text{V}$

## 8 Chopper-Stabilized and AZ Chopper OpAmps and InstAmps

The smooth continuous-time chopper amplifier is the best approach to low offset. However, a 0.01% clock skew multiplied by an initial  $6\sigma$  offset voltage of 10 mV of the first stage of a CMOS amplifier presents a lower limit to the residual offset on the order of 1  $\mu\text{V}$ . Moreover, the main disadvantage of the chopper amplifier is the chopper-induced square wave ripple, which referred to the input is equal to the initial offset on the order of 10 mV at 6-sigma. Hence, the ripple and offset of the input amplifier must be further reduced.

The next step of improvement is the chopper-stabilized chopper amplifier [14]. The topology is shown in Fig. 33.

If an amplifier has a high loop gain the differential input voltage becomes zero, except for the input offset voltage. This means in the case of the chopper-stabilized chopper amplifier of Fig. 33 that the right-hand side of chopper  $Ch_2$  sees  $V_{os2}$ . Hence, the left-hand input side carries a square wave voltage equal to  $V_{os2}$ . This allows us to directly connect the correction amplifier  $G_{m5}$  to the input without extra chopper. We do not need to discuss the chopper-stabilizer loop anymore, because we already discussed this at Fig. 33. However, there are major differences.

Firstly, the first stage of the main amplifier now determines the noise at low frequencies, while the correction loop determines the ripple at the clock frequency. Secondly, the hybrid nested capacitors  $C_{M31}$  and  $C_{M32}$  are not anymore connected to the input of the integrator, but to the input of chopper  $Ch_3$  in order to maintain continuous negative feedback in the loop including  $Ch_1$  [10].



**Fig. 33** Chopper-Stabilized Chopper OpAmp with multipath hybrid-nested Miller compensation.  
 $V_{os} = \sim 1 \mu V$ ,  $V_{rip} = \sim 50 \mu V$

This means that the parasitic capacitor  $C_{p5}$ , at the output of the sense amplifier, is now increased with the series connection of  $C_{M31}$  and  $C_{M32}$ . To avoid the extra offset of this parasitic capacitor in combination with  $V_{os4}$  of the integrator, either the offset  $V_{os4}$  has to be reduced, or  $C_{M31}$  and  $C_{M32}$  can be connected through the folded cascode at the output of  $G_{m5}$ . Thirdly, the parasitic capacitor  $C_{p2}$  before chopper  $Ch_1$  is now charged and discharged to the offset voltage  $V_{os1}$  of the output stage  $G_{m1}$ . This causes spikes at the output through the first set of Miller capacitors  $C_{M11}$  and  $C_{M12}$  at the size of  $V_{os1} C_{p2}/C_{M1}$ , while  $C_{M1} = C_{M11} C_{M12}/(C_{M11} + C_{M12})$ . Therefore, the parasitic capacitor  $C_{p2}$  at the output of  $G_{m2}$  and  $G_{m3}$  needs to be small.

The offset of  $G_{m5}$  causes a triangle ripple at the output of the integrator and a saw-tooth like ripple through  $Ch_1$  at the output. This can be eliminated if the offset of the sense amplifier  $G_{m5}$  is auto-zeroed similar to the chopper-stabilized amplifier if Fig. 28. To further reduce the offset caused by the parasitic capacitor  $C_{p5}$  in combination with the offset of the integrator amplifier  $G_{m4}$  this amplifier can also be auto-zeroed by an extra loop around it [14]. These features are shown in Fig. 34. In this way an offset of  $0.1 \mu V$  can be achieved with a ripple lower than  $2 \mu V$ . Nanosecond chopper spikes of several mV can be observed at the input and output.

A chopper-stabilized chopper instrumentation amplifier appears when the HF and LF amplifier paths are doubled [15] according to Fig. 35. In contrast to the chopper-stabilized IA of Paragraph 7, the gain in a chopper IA is not set by the ratio of  $G_{m51}$  and  $G_{m52}$  of the correction loop, but by the ratio of  $G_{m21}$  and  $G_{m22}$  of the main amplifier in cooperation with the feedback network.

$$A_v = G_{m21}(R_1 + R_2)/G_{m22}R_1 \quad (14)$$

The reason that the sense amplifiers  $G_{m51}$  and  $G_{m52}$  do not determine the gain by their ratio, is because their influence is shifted by the choppers around the main amplifier to the clock frequency.  $G_{m52}$  is sensing the feedback ripple as a result of



**Fig. 34** Chopper-Stabilized Chopper OpAmp with UPmultipath hybrid-nested Miller compensation, auto-zero  $G_{m5}$  and  $G_{m4}$ .  $V_{os} = \sim 0.1 \mu\text{V}$ ,  $V_{rip} = \sim 1 \mu\text{V}$



**Fig. 35** Chopper-Stabilized Chopper InstAmp with UPmultipath hybrid-nested Miller compensation.  $V_{os} = 2 \mu\text{V}$ ,  $V_{rip} = \sim 200 \mu\text{V}$

the offset of  $G_{m21}$  and  $G_{m22}$ . The output current of  $G_{m52}$  is rectified by chopper  $Ch_3$  and amplified by the integrator  $G_{m4}$  and coupled by  $G_{m3}$  to the output of  $G_{m21}$  and  $G_{m22}$  in order to compensate the ripple due to offset in the main chopper path. The feedback signal-dependant part at the input of  $G_{m52}$  is compensated for by the signal-dependent part at the input of  $G_{m51}$ . Therefore the signal does not interfere



**Fig. 36** Chopper-Stabilized Chopper InstAmp with UPmultipath hybrid-nested Miller comp. and auto-zero  $G_{m5}$  and  $G_{m4}$ .  $V_{os} = 0.2 \mu\text{V}$ ,  $V_{rip} = \sim 2 \mu\text{V}$

with the offset cancellation. The offset of the correction amplifiers  $G_{m51}$  and  $G_{m52}$  is chopped into a square wave by chopper  $Ch_3$ . The integrator does not amplify this square wave, but reduces it into a small triangular wave. Referred to the input it has to pass chopper  $Ch_{21}$ . This means that the shape now becomes a small saw-tooth at the double clock frequency.

The next step to reduce the saw-tooth ripple is to auto-zero the sense stages  $G_{m51}$  and  $G_{m52}$  [15]. This is shown in Fig. 36.

The most important offset contribution of the chopper-stabilized chopper instrumentation amplifier that is left, comes from the combination of the parasitic capacitance  $C_{p5}$  at the output of  $G_{m5}$  in combination of the offset voltage  $V_{os4}$  at the input of  $G_{m4}$ , see (10) This is particularly important as the hybrid nested Miller capacitors  $C_{M31}$  and  $C_{M32}$  are connected in parallel to the parasitic capacitor  $C_{p5}$  at the output of  $G_5$ . To further reduce this offset component also  $G_{m4}$  is auto-zeroed too, as shown in Fig. 36. In this way the final offset can be reduced to values well below  $0.2 \mu\text{V}$  with a ripple lower than  $2 \mu\text{V}$ .

It has to be kept in mind that the voltage gain of the correction loop  $G_{m5}$ ,  $G_{m4}$ ,  $G_{m3}$  must be taken  $10^4$  times larger than the voltage gain of  $G_{m2}$  in order to reduce its offset to  $0.4 \mu\text{V}$  and ripple from  $10 \text{ mV}$  to a level of  $10 \mu\text{V}$ .

## 9 Summary Low Offset

Table 1 gives an overview of the offset and noise of the Operational Amplifiers in the chapters (“**Jointly Optimize Equalizer and CDR for UPmulti-Gigabit/s**”)

**Table 1** Summary of offset and ripple that can be obtained

| OpAmps     | Vos              | Vrip        | InstAmps   | Vos            | Vrip        |
|------------|------------------|-------------|------------|----------------|-------------|
| A Z        | 20 – 100 $\mu$ V |             | A Z        | 20–100 $\mu$ V |             |
| Chopper    | 10 $\mu$ V       | 10 mV       | Chopper    | 20 $\mu$ V     | 20 mV       |
| N Chopper  | 0.1 $\mu$ V      | 100 $\mu$ V | N Chopper  | 0.2 $\mu$ V    | 200 $\mu$ V |
| ChSt       | 10 $\mu$ V       | 100 $\mu$ V | ChSt       | 20 $\mu$ V     | 200 $\mu$ V |
| ChSt+AZ    | 1 $\mu$ V        | 1 $\mu$ V   | ChSt+AZ    | 2 $\mu$ V      | 2 $\mu$ V   |
| Ch+ChSt    | 1 $\mu$ V        | 100 $\mu$ V | Ch+ChSt    | 2 $\mu$ V      | 200 $\mu$ V |
| Ch+ChSt+AZ | 0.1 $\mu$ V      | 1 $\mu$ V   | Ch+ChSt+AZ | 0.2 $\mu$ V    | 2 $\mu$ V   |

SerDes”, “Time to Digital Conversion: An Alternative View on Synchronization”, “Current Sense Amplifiers with Extended common Mode Voltage Range”).

Chopping generally can reduce offset by a factor of 10,000. But the ripple stays equal to the offset without other measures. Auto-zeroing reduces the offset by a factor of 100 to 500, depending whether the AZ store capacitors are placed at the input or at the output. Further improvement can be obtained when we combine chopping and auto-zeroing. Abbreviations used in Table 1 are: AZ = Auto-Zeroing, N = Nested, ChSt = Chopper-Stabilized, Ch = Chopping.

## References

1. Johan Huijsing, “Operational Amplifiers, Theory and Design”, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2001, 456 pages, Chapter 1.
2. P.C. Pham, J. McDonald, and P. McDevitt, “A 2.5 Gb/s 32:1/1:32 Sonet UPMux/DeUPmux Chip Set”, Proceedings of the ISSCC, IEEE, Feb. 1996, pp. 120–121.
3. Bernard van den Dool, and Johan Huijsing, “Indirect Current-Feedback Instrumentation amplifier with a common-mode input range that includes the negative rail”, IEEE Journal of Solid-State Circuits, Vol. 28, No. 7, July 1993, pp. 743–749.
4. K. de Langen and J.H. Huijsing, “Compact low-voltage power-efficient operational amplifier cells for VLSI,” IEEE JSSC, Vol. 33, No. 10, Oct. 1998, pp. 1482–1496.
5. I.E. Opris, and G.T.A. Kovacs, “A rail-to-rail ping-pong OpAmp”, IEEE Journal of Solid-State Circuits, Vol. 31, No. 9, Sept. 1996, pp. 1320–1324.
6. Christian Enz, Eric Vittoz, and F. Krummenacher, “A CMOS chopper amplifier”, IEEE Journal of Solid-State Circuits, Vol. 22, No. 3, pp. 708–715, June 1987.
7. Anton Bakker, Kevin Thiele, and Johan Huijsing, “A CMOS Nested Chopper Instrumentation Amplifier with 100 nV Offset”, IEEE Solid-State Circuits, vol. 35, No.12, Dec. 2000.
8. Andrew Tang, “Ping-pong amplifier with auto-zeroing and chopping”. US patent Nr. 6,476,671, issued May 11 2002. Analog Devices.
9. Christian Enz, and Gabor Temes, “Circuit techniques for reducing the effect of OpAmp imperfections: Autozeroing, correlated double sampling and Chopper Stabilization”, Proceedings of the IEEE, Vol. 84, No. 11, Nov. 1996.
10. Johan Huijsing, Jeroen Fonderie, and Behzad Shahi, “Frequency stabilization of chopper-stabilized amplifiers”, US patent Nr. 7,209,000, April 24, 2007.
11. Johan F. Witte, Kofi Makinwa, and Johan Huijsing, “A CMOS chopper offset-stabilized OpAmp”, 2006 European Solid-State Circuits Conference, Proceedings pp. 360–363.
12. Rod Burt, and Joy Zhang, A micropower chopper-stabilized operational amplifier using a SC notch filter with synchronous integration inside the continuous-time signal path”, IEEE Journal of Solid-State Circuits, Vol. 41, No.12, Dec. 2006, pp. 2729–2736.

13. Johan F. Witte, Johan Huijsing, and Kofi Makinwa, “A current-feedback instrumentation amplifier with 5  $\mu$ V offset for bidirectional high – side current sensing”, IEEE Solid-State Circuits Conference 2008, San Francisco, Session 3.5, Feb. 4–6, 2008.
14. Johan Huijsing, and Jeroen Fonderie, “Chopper Chopper-Stabilized operational amplifiers and methods”, US patent Nr. 6,734,723, May 11, 2004.
15. Johan Huijsing, and Behzad Shahi, “Chopper Chopper-Stabilized instrumentation and operational amplifiers”, US patent Nr. 7,132,883, Nov. 7, 2006.

# Current Sense Amplifiers with Extended Common Mode Voltage Range

W.J. Kindt

**Abstract** Current sense amplifiers are difference voltage amplifiers that sense the voltage across a shunt resistor to measure a current. In many applications the common mode voltage on the shunt resistor is much larger than the supply voltage of the current sense amplifier. Such current sense amplifiers, with an extended common mode voltage range, are discussed. These can either be designed in a low-voltage process, such that the common mode voltage range is extended both beyond the supply rails and beyond the intrinsic process capabilities, or in a high-voltage process, which typically results in a better performance. Multiple current sense amplifier topologies are reviewed and compared. Experimental results on two new circuits are presented. One novel circuit is designed in a low-voltage silicon-on-insulator process which results in significant performance benefits over a standard low-voltage process.

## 1 Introduction

In the past one or two decades, many applications have come about that require the measurement of electrical currents. Examples of such applications are:

- The measurement of the charge or discharge current of rechargeable batteries, for instance the batteries in notebook computers or in (hybrid) electric vehicles. Knowledge of the current in these applications can extend battery life and/or enhance battery safety.
- The measurement of the current through LED light sources. The luminosity of LEDs is proportional to the current so controlling the current is an effective manner to control brightness.
- The measurement of the currents flowing through the solenoids in electromotors in industrial or automotive applications. The force exerted by the electromotor is proportional to the current through the solenoid. It is often easier to measure

---

W.J. Kindt (✉)

Delft Design Center, National Semiconductor Corporation Delft, Netherlands  
e-mail: Wilko.kindt@nsc.com

this current than to measure the mechanical force when creating a mechanical control system.

Some of the mentioned examples are high volume applications so there is an attractive market for ICs that implement the function of current measurement.

A very convenient manner to measure the current flowing through a conductor is to insert a small shunt resistor that converts the current into a voltage and measure that voltage. An advantage of this method is that the value of the shunt can be modified to adapt the range of the current measurement instrument. A disadvantage of the shunt resistor method is that some power will be dissipated in the shunt. Obviously, the power loss can be minimized by reducing the value of the shunt. This in turn reduces the voltage across the shunt and puts more stringent demands on the circuit that measures it.

In the applications given above, the currents often have to be measured at voltages that are relatively large compared to the supply voltage of modern integrated circuits. For instance, automotive applications require the measurement of currents flowing through conductors that are at the battery voltage (12 V, 24 V or even higher). In many applications, the signal representing the current will be fed into an analog to digital converter (ADC). The ADC typically runs off supply voltages that are much smaller than the voltages in the system in which the current is to be measured. Therefore, there is a need for integrated circuits that amplify the small differential voltage across the shunt and transform it into a signal that is typically ground referenced while rejecting a potentially large common mode voltage. Such circuits are called current sense amplifiers and these are the subject of this paper.

In some applications the common mode voltage may not only be large, but also have a very large AC component. This occurs for instance when controlling the current through solenoids using power MOS switches and pulse width modulation as presented in Fig. 1. When the MOSFET is on, the solenoid is connected to the



Fig. 1 Control of current through a solenoid using pulse width modulation

battery and the current through the solenoid will increase. During this time, the common mode voltage on the shunt resistor is close to ground. When the MOSFET is off, the current through the solenoid will flow in the mesh created by the solenoid, the shunt resistor and the freewheeling diode. During this time, the current will gradually decrease because of power losses inside this mesh. During this period the common mode voltage across the shunt resistor will be slightly above the battery voltage. The current sense amplifier should completely reject the large varying common mode voltage while amplifying and level shifting the small differential voltage across the shunt.

The name ‘current sense amplifier’ is somewhat misleading because it can be interpreted as an amplifier with a current input. In fact, a current sense amplifier is simply a voltage difference amplifier that is designed and optimized for sensing the differential voltage across a shunt resistor while rejecting large input common mode voltages. From this description it is clear that a current sense amplifier is quite similar to an instrumentation amplifier. It is true that many of the design aspects related to current sense amplifiers are similar to those related to instrumentation amplifiers. There is, however, an important difference: current sense amplifiers typically have an input common mode voltage range (CMVR) that extends far above, and sometimes also below, the supply rails. The CMVR extension is typically achieved at the cost of a reduced input impedance, which is not a significant specification for the target application at all.

The phrase ‘extended common mode voltage range’ in the title of this paper should be interpreted either as ‘extended beyond the supply’ or as ‘extended beyond the intrinsic capabilities of the process’. Provided some measures are taken, current sense amplifiers can be fabricated in low-voltage analog processes. In this case, the CMVR of the amplifier does not only extend beyond the supply, but also beyond the standard operating voltage range of the fabrication process. Such current sense amplifiers will be discussed in Section 2. It is also possible to design current sense amplifiers in fabrication processes that can intrinsically handle the large CMVR of the applications. As will be discussed in Section 3, such current sense amplifiers do provide some performance benefits. A new current sense amplifier fabricated in Silicon On Insulator (SOI) will be presented in Section 4. A classification of current sense amplifier topologies will be given in Section 5. Finally, Section 6 will present some conclusions and a future outlook.

## 2 Current Sense Amplifiers in Low-Voltage Fabrication Processes

In most low-voltage IC fabrication processes the junction breakdown voltages of the active components (e.g. the base collector breakdown voltage of the bipolar transistors) are only somewhat larger than the intended maximum supply voltage. This means it is very hard to process the high voltage signals on the input of a current sense amplifier with these active components directly. Moreover, in most processes

all active components are junction isolated from the substrate and the breakdown voltage of the isolation junctions poses another limitation on the maximum voltage that can be processed. The only option that a designer has left is to use thin-film or poly resistors on the inputs. Because these resistors are oxide isolated from the substrate they can withstand much larger voltages.

Of course, the oxide breakdown voltage towards the substrate is still a limit to the maximum voltage that can be handled by the resistors. Also, there is still the issue of input ESD protection. Even if the inputs are only connected to oxide isolated polysilicon or thin-film resistors, it will still be necessary to protect the inputs against ESD events, if only to protect against dielectric breakdown during human body ESD events at 2 kV or higher. In practice this means that many standard low-voltage processes are not suited for the implementation of high-CMVR current sense amplifiers while in some other processes special measures will have to be taken [1].

## 2.1 Dynamic Bridge

Figure 2 shows a schematic of a circuit known as a dynamic bridge. Without the resistors  $R_2$ , the circuit simplifies to a standard opamp difference amplifier with input resistors  $R_1$  and feedback resistors  $R_3$ . The differential input voltage is gained up by a factor  $-R_3/R_1$  and level shifted to be referenced to the voltage on the  $V_{ref}$  input. Because the gains are typically much larger than one, the input resistors  $R_1$  are typically much smaller than the feedback resistors  $R_3$ , and the common mode voltage at the input of the opamp is almost equal to common mode voltage at the input of the overall circuit. Therefore, the input CMVR of the difference amplifier is significantly limited by the input CMVR of the opamp which usually is restricted by the supply rails. To extend the input CMVR of the overall circuit the resistors  $R_2$  are added. Together with the resistors  $R_1$  these create a bridge that divides the input common mode voltage. The resistors  $R_2$  do not affect the signal transfer because



**Fig. 2** A dynamic bridge current sense amplifier implementation

there is no differential voltage across the opamp inputs and the resistors  $R_2$  do not unbalance the overall circuit.

### 2.1.1 Offset Performance

The resistors  $R_2$  significantly increase the gain with which the offset of the opamp is referred back to the input of the system. The system offset voltage  $V_{os}$  due to the opamp offset  $V_{os,amp}$  equals:

$$V_{os} = \frac{R_1 R_2 + R_1 R_3 + R_2 R_3}{R_2 R_3} \cdot V_{os,amp} = \left(1 + \frac{R_1}{R_2 \parallel R_3}\right) V_{os,amp} \quad (1)$$

In typical applications, the CMVR should be extended significantly, which requires that  $R_2$  is much smaller than  $R_1$ . For gains larger than one it follows that  $R_3 > R_1 > R_2$  and the offset gain can be simplified to  $R_1/R_2$ . This ratio ( $R_1/R_2$ ) will appear frequently. If the input CMVR of the opamp is approximately equal to the supply voltage, then the input CMVR of the circuit in Fig. 2 is approximately equal to  $R_1/R_2$  times the supply voltage. It is obvious that larger CMVRs can easily be obtained by increasing  $R_1/R_2$  but this immediately carries an offset penalty.

### 2.1.2 Noise Performance

The common mode divider resistors  $R_2$  also significantly degrade the input referred noise of the current sense amplifier. An evaluation of the input voltage noise density due to the three resistors in the circuit results in:

$$u_n^2 = 8kTR_1 \left(1 + \frac{R_1}{R_2} + \frac{R_1}{R_3}\right) \quad (2)$$

The term to the left of the brackets is the noise due to the input resistors which obviously has to be expected in a topology with resistors at its input. From the equation and the inequality  $R_3 > R_1 > R_2$  discussed above, it follows that due to the presence of the resistors  $R_2$ , the equivalent input referred noise density increases approximately by a factor  $\sqrt{R_1/R_2}$ .

### 2.1.3 Speed Performance

The presence of the resistors  $R_2$  reduces the speed of the current sense amplifier. Assuming the opamp has a single pole transfer characteristic with a gain bandwidth product  $\omega_0$ , the AC transfer function of the system shown in Fig. 2 is:

$$\frac{u_o}{u_i} = \frac{R_3}{R_1} \frac{1}{1 + \frac{j\omega}{\omega_0} \left(1 + \frac{R_3}{R_1} + \frac{R_3}{R_2}\right)} \quad (3)$$

Without the presence of the resistors  $R_2$  ( $R_2 \rightarrow \infty$  in the equation) this equation shows that for relatively large system gains  $R_3/R_1$ , the pole of the system transfer function is approximately located at the gain-bandwidth product of the opamp  $\omega_0$  divided by the gain. With the resistors  $R_2$  present and assuming  $R_3 > R_1 > R_2$  the pole moves to  $\omega_0$  divided by  $R_3/R_2$ , which is at a factor  $R_1/R_2$  lower frequency.

The conclusion is that the offset, noise and bandwidth are all degraded by the same ratio  $R_1/R_2$ .

### 2.1.4 Matching Requirements

In the circuit shown in the above figure the high input common mode voltage is not isolated from the output signal: there are resistive paths between the input and output. In high performance instrumentation amplifier topologies, a combination of both isolation and balancing ensures a good CMRR [2]. In the circuit shown above, the CMRR is guaranteed only by the amount of balancing that can be achieved. Mismatches in the resistor pairs will cause errors in the output voltage. The effect of resistor mismatch in the dynamic bridge circuit can be analyzed using the circuit shown in Fig. 3. One can calculate the input offset voltage required to counterbalance the effect of resistor mismatch, assuming that the input is biased at a common mode voltage  $V_{cm}$  the output and the reference input are both fixed at an output reference voltage  $V_{ref}$  and the input common mode divider resistors  $R_2$  are biased at an internal reference voltage  $V_{int}$ . It is easier to use the conductances ( $g_1$ ,  $g_2$  and  $g_3$ ) instead of the resistances ( $R_1$ ,  $R_2$  and  $R_3$ ) of the resistors. Some calculations result in:



**Fig. 3** Circuit schematic used for calculating the effect of resistor mismatch

$$\begin{bmatrix} \frac{\partial V_{os}}{\partial V_{cm}} \\ \frac{\partial V_{os}}{\partial V_{int}} \\ \frac{\partial V_{os}}{\partial V_{ref}} \end{bmatrix} = \frac{1}{g_1 + g_2 + g_3} \begin{bmatrix} g_2 + g_3 & -g_2 & -g_3 \\ -g_2 & \frac{g_2(g_1 + g_2)}{g_1} & \frac{g_2g_3}{g_1} \\ -g_3 & -\frac{g_2g_3}{g_1} & \frac{g_3(g_1 + g_2)}{g_1} \end{bmatrix} \begin{bmatrix} \frac{\Delta R_1}{R_1} \\ \frac{\Delta R_2}{R_2} \\ \frac{\Delta R_3}{R_3} \end{bmatrix} \quad (4)$$

The first term on the left is the derivative of the offset to the input common mode voltage, which is the inverse of the common mode rejection ratio (CMRR). The other terms are similar rejection ratios for the external and internal reference voltages. Considering the fact that  $g_2 > g_1 > g_3$ , it can be seen from the matrix equation that for instance the CMRR term,  $\partial V_{os}/\partial V_{cm}$ , is approximately proportional to the mismatches in  $R_1$  and  $R_2$ . To achieve a CMRR in the order of 80 dB, the mismatches of these resistor pairs should be in the order of 0.01%. Obviously, to achieve that kind of CMRR performance, the resistors will have to be trimmed.

### 2.1.5 Trimming Considerations

Ideally all rejection ratios should be infinite (the derivatives in (4) should be zero). The circuit can be optimized by trimming the resistors to improve their matching. The matrix equation implies that it is not straightforward to sequentially measure one of these quantities and trim a single resistor pair because each resistor mismatch term affects all three rejection ratios. In other words: the trims won't be orthogonal.

A potential resistor trim algorithm could be based on the fact that the matrix equation above can be inverted to find the optimal trim values (the change in the resistor matching) as a function of measured values of the derivatives on the left hand side of (4). This matrix inversion will be problematic though because the matrix is singular. This is because it is not necessary to match all three resistors exactly. There is an additional degree of freedom because the two sides of the bridge only need to have matched ratios, not matched absolute values. For instance, the circuit would still be perfectly well balanced if  $R_1 = \alpha^* R_1'$ ,  $R_2 = \alpha^* R_2'$  and  $R_3 = \alpha^* R_3'$ . For this reason, one has the freedom to only measure two rejection ratios and only trim two resistor pairs.

## 2.2 Improved Dynamic Bridge

It is clear from the above discussion that making the resistor  $R_2$  as large as possible will optimize the performance parameters of the current sense amplifier. A good way to achieve this for circuits that have to sense both above the positive rail as well as below the negative rail is illustrated in Fig. 4 [3]. The common node of the  $R_2$  resistors is driven by a voltage source that changes inversely with the input

**Fig. 4** A dynamic bridge current sense amplifier with additional common mode control loop to maximize the value of  $R_2$



common mode voltage detected at the input of the opamp. In this manner, circuit can sense above the positive, as well as below the negative rail. In the circuit shown in Fig. 2 sensing below ground is only possible by driving the common node of the resistors  $R_2$  with a reference voltage larger than zero which would reduce the maximum input common mode voltage that the bridge can handle for a given value of  $R_2$ . The circuit shown in Fig. 4 is an elegant manner to allow sensing beyond both supply rails without degrading the  $R_1/R_2$  ratio. It should be noted though that there is no performance improvement for circuits that only have to sense at one side of the supply rail.

### 2.3 Dynamic Level Shift

As discussed in the prior paragraphs, the common mode dividing bridge in the dynamic bridge current sense amplifier is degrading offset, noise and speed performance. Also the common mode rejection ratio of the circuit is limited by resistor matching and the trim is somewhat complicated because the trim of the three resistor pairs in the bridge is not orthogonal. It is interesting to review an alternative circuit topology which will be referred to as dynamic level shift. This topology is shown in Fig. 5. There is a control loop that detects the common mode at the input of the opamp and controls the value of the current sources  $I_2$  below the input resistors  $R_1$  in such a way that the voltage at the input of the amplifier is equal to a reference voltage. In this manner, the voltage drop across the input resistors  $R_1$  is controlled such that the resistors implement a voltage level shift between the input common mode voltage and the internal reference voltage.

At first glance one might think that the performance of the circuit shown in Fig. 5 is much better than that of the dynamic bridge because the relatively low-valued resistors  $R_2$  that caused the performance degradations in the dynamic bridge have been replaced with current sources with a much larger output impedance. However,

**Fig. 5** Dynamic level shift current sense amplifier



the physical implementation of the current sources will re-introduce most of the problems. Figure 6 shows the dynamic level shift circuit with the current sources integrated using degenerated bipolar transistors. Degenerated bipolars are a very accurate and low-noise manner to create current sources. However, for this application their performance is not sufficient. The problem is that the resistors  $R_2$  used to implement the current sources, will have to have a similar – or even smaller – value compared to the common mode divider resistors in the dynamic bridge. These resistors will still have to carry the entire common mode level shift current flowing through the input resistors and the voltage headroom available on the resistors  $R_2$  is still limited to the supply voltage of the circuit. This means that the white noise generated by the current sources will still be the dominant noise source in the circuit (see Equation (2)). Also, the base-emitter voltage ( $V_{be}$ ) mismatch of the BJTs in the current sources will be gained up by a ratio  $R_1/R_2$  when referred back to the



**Fig. 6** Dynamic level shift current sense amplifier with the current sources implemented with degenerated bipolar transistors

input of the current sense amplifier. The  $V_{be}$  mismatch of the bipolars in the current sources is expected to be in the same order of magnitude as the  $V_{be}$  mismatch of the bipolars used in the input stage of the opamp. This means that instead of ‘solving the problem’ of the high offset due to the opamps offset being gained up when referred to the input, the dynamic level shift just ‘moves the problem’ from the components used in the opamp’s input stage to the components used in the level shift current sources.

## 2.4 Dynamic Level Shift with Dynamically Matched Level Shift Current Sources

A way to improve the performance of the dynamic level shift circuit, and gain back the offset performance advantage, is to dynamically match the level shift current sources as shown in Fig. 7 [4]. The two current sources can be dynamically matched by commuting them at a certain clock frequency. If the commutation frequency is chosen larger than the forward differential signal path bandwidth in the current sense amplifier, chopper residuals can be filtered by the differential amplifier. As shown in the figure, it is possible to filter the chopper residuals right at the point where they are generated by adding a filter capacitor between the input of the amplifier stage that reads out the differential voltage behind the input level shift resistor  $R_1$ . Obviously, additional filter poles can be placed in the signal path behind the input level shift stage. A nice feature of the topology is that the chopping only occurs in the common mode level shift path while the differential signal path is not affected.



**Fig. 7** Dynamic level shift current sense amplifier with dynamically matched current sources

This may reduce the problem of chopper residuals appearing at the output. On the other hand, the circuit is commuting relatively large level shift currents which may introduce chopper residuals that are relatively large. It should be noted that the noise contribution of the current source resistors  $R_2$  is not affected by the chopper switches so the white noise of the circuit shown in Fig. 7 remains very high. Only the offset and  $1/F$  noise are eliminated by the chopper.

A practical implementation of this circuit has shown very good performance compared to other current sense amplifiers that use resistors to absorb the voltage difference between the high voltage input signal and the low voltage supply domain. The fabricated circuit has an input CMVR of  $-20$  to  $+60$  V when running on a 5 V supply. Figure 8 shows offset ( $V_{os}$ ), temperature coefficient of the offset ( $TCV_{os}$ ), and common mode rejection ratio (CMRR) histograms of this circuit. The  $V_{os}$  is only a few hundred  $\mu$ V, the  $TCV_{os}$  is below  $10\mu\text{V}/^\circ\text{C}$  and the CMRR is better than  $100$  dB at room temperature and close to  $100$  dB at the industrial temperature extremes. Also shown in the figure is the input referred noise density, which is still very high due to the use of resistors.



**Fig. 8**  $V_{os}$ ,  $TCV_{os}$ , CMRR and noise performance achieved with a practical implementation of the dynamically matched dynamic level shift circuit with an input CMVR from  $-20$  to  $+60$  V operating from a 5 V supply

### 3 Current Sense Amplifiers in High Voltage Fabrication Processes

This section will discuss current sense amplifiers fabricated in a high voltage process. It will be shown that these circuits have significant advantages over the resistor topologies discussed in the prior section.

#### 3.1 Operational Amplifier with Extended CMVR

Most current sense amplifiers that are fabricated in a high-voltage process technology use an input stage similar to the one shown in Fig. 9. This self-biasing input stage can be used to extend the input CMVR of an operational amplifier significantly beyond its supply rail [5]. The transistors  $Q_1$  and  $Q_2$  are a floating mirror, setting up a bias current in the transistors. The two transistors  $Q_2$  are an emitter-driven differential pair that converts the differential input voltage into a differential output current  $g_m \times V_{in}$ . The emitters and bases of all transistors are close to the input common mode voltage while the collectors of the input devices  $Q_2$  can be at a relatively low potential so they can interface with a second stage biased from a low voltage supply. In this manner, the base-collector junction is blocking the high input common mode voltage. A maximum limit to the input CMVR results from the base collector breakdown voltage of the PNP transistors. Obviously, a similar topology can also be implemented with LDMOS transistors.

The input stage in Fig. 9 is a common base input stage which has voltage gain, but no current gain. It does draw current from the signal source. For some applications this makes this input stage less suitable, but it is still very well suited for current sense amplifier applications.

Both mismatch between the two input transistor  $Q_1$  as well as mismatch between the bias transistors  $Q_2$  and the input devices  $Q_1$  introduces an input referred offset. It is often possible to modify the general principle shown in Fig. 9 in such a way that the offset contribution of the bias transistors is avoided.



**Fig. 9** Extended CMVR input stage

**Fig. 10** Simple current sense amplifier using the extended CMVR input stage



An opamp using the input stage shown in Fig. 9 can be used together with some feedback resistors to implement a current sense amplifier as shown in Fig. 10. The large advantage of this circuit is that the common mode divider resistors  $R_2$  that were shown in Fig. 2 are no longer required, avoiding most of the associated disadvantages. The CMRR of the resulting circuit still depends on the matching of the resistor pairs  $R_1$  and  $R_3$  and trim would still be required.

### 3.2 Current Sense Amplifiers Based on Accurate High Voltage V-I Convertor

Figure 11 shows a schematic illustrating another operation principle upon which current sense amplifiers fabricated in high-voltage processes can be based. The figure shows a high-voltage accurate V-I convertor that transforms the differential input voltage into a differential current. The PMOS transistors are gain boosted to achieve a very low input impedance. The gain-boosting amplifier ensures the voltages at the sources of the PMOS transistors are equal. In this manner the transconductance of the V-I convertor is made accurate and equal to  $1/R_1$ . Note that the high voltage is blocked by the drains of the transistor  $MP_1$  and there is no resistive coupling between the input and output. Therefore, the common mode rejection ratio is not limited by resistor matching.

As schematically presented in the figure, the gain boost amplifier can potentially run from a floating supply that is biased from the high common mode input voltage. This floating supply voltage can for instance be created using a zener diode. Of course, in a high voltage process it is also possible to bias the entire gain boost amplifier from the input common mode voltage source. An alternative solution is shown in Fig. 12. In this circuit, not only the source-driven differential input stage

**Fig. 11** An accurate V-I converter can be operating at a high common mode voltage domain and the output current can be transmitted to the low-supply voltage domain using high-voltage PMOS transistors



**Fig. 12** A possible implementation of a self-biased gain-boosted V-I convertor



is self-biased from the input common mode voltage, but the gain-boost amplifier is self-biased as well.

The readout circuit that processes the drain currents can run from a low supply voltage. The drain junctions of the PMOS devices in the input V-I convertor will block the difference between the input common mode voltage and the supply voltage of the readout circuit. There are many ways to implement the low voltage readout circuitry. For instance, the differential drain current can simply be dumped into two matched resistors and the resulting differential voltage can be processed with a standard low-voltage instrumentation amplifier. Alternatively, the differential currents can be transformed into a single ended current using a mirror. Of course, it also possible to use a low-voltage transimpedance amplifier build around an operational amplifier.

### **3.3 Current Sense Amplifiers with Current Feedback to the High Voltage Inputs**

The current sense amplifiers discussed in the prior section used a high accuracy voltage to current convertor that is operating at high common mode voltage. The alternative solution shown in Fig. 13 does not require a high accuracy V-I convertor at high voltages. Instead an accurate feedback signal is generated by the readout circuit operating at low supply voltages and that current is fed back in front of the high voltage input stage. Because of the loop gain in the readout circuit, the circuit is only balanced when the sources of the high-voltage PMOS input stage are at equal potential. This means that the differential input voltage is still accurately converted into a differential current through the input resistors and the differential feedback current flowing through the NMOS transistors is equal to the differential current flowing through the input resistors. In the end, the differential voltage at the sources of the NMOS transistors equals a resistor ratio  $R_3/R_1$  times the differential input voltage. In practice, it is necessary to stabilize the common mode voltage at the opamps input and output with an additional feedback loop.

The output signal of the current sense amplifier shown in Fig. 13 is present as a differential voltage across the resistors  $R_3$ . It is important to note that these resistors determine the voltage gain of the current sense amplifier. The circuit that takes the differential voltage from these resistors should have a very high impedance input to avoid affecting the gain accuracy of the system.

The principle of feeding back a signal current in front of the sources of the high voltage input transistors allows the creation of an elegant and small current sense amplifier topology as shown in Fig. 14. Assuming the input signal is positive, the current through  $MP_1'$  will be larger than the current through  $MP_1$ . The current



**Fig. 13** A current sense amplifier with feedback using an accurate low-voltage V-I convertor that feeds back in front of the high-voltage input stage

**Fig. 14** A current sense amplifier using only a few components



through  $MP_1$  is mirrored by  $MN_2$  and  $MN_2'$  and summed to the current of  $MP_1'$ . The result is that the gate of  $MN_3$  will be pulled high and a feedback current will be pulled through  $R_1'$  to counteract the differential voltage between the drains of  $MP_1$  and  $MP_1'$ . The end result is that the differential input voltage is gained up by a ratio  $R_3/R_1$  and appears referenced to ground at the drain of  $MN_3$ . A disadvantage of this circuit is that it is only possible to sense positive input voltages and the circuit is not very accurate for very small input signals.

#### 4 Current Sense Amplifier Created Silicon On Insulator

Silicon On Insulator (SOI) technology allows a designer to float active components inside their oxide isolated tub at high potential relative to the substrate. This allows the combination of the high-voltage circuit topologies discussed in Section 3 with the principle of resistor-based level shifting that was discussed in Section 2.

An example of a circuit that can be created in SOI is given in Fig. 15. This circuit is somewhat similar to the current sense amplifier shown in Fig. 10, except instead of having a CMVR limitation imposed by the active component used in the common base (or gate) input stage, additional voltage room is created by introducing a level shift voltage on the level shift resistors  $R_2$  that tracks the input common mode voltage. The reference branch on the left hand side of the circuit sends a reference current through the input of the mirror  $Q_{11}$  that is mirrored into  $R_2$  and  $R_2'$  by  $Q_1$  and  $Q_1'$ . In this way, the common mode dependent voltage across  $R_{21}$  is copied onto  $R_2$  and  $R_2'$ . The voltage on the bottom end of  $R_2$  and  $R_2'$  is referenced to the negative rail but because of the level shift voltage introduced across these resistors, the collectors of  $Q_1$  and  $Q_1'$  will track the input common mode voltage.

The PNP transistors  $Q_1$  and  $Q_1'$  act as the input stage of the feedback amplifier.  $Q_6$  and  $Q_6'$  are the second stage. The third stage is the output stage which is schematically represented with a triangle. At the emitters of  $Q_6$  and  $Q_6'$  the common mode behind the level shift resistors is detected and fed back on the bases



**Fig. 15** A current sense amplifier implemented in SOI, with active components floating at the high input common mode voltage domain and resistive level shifts

of the level shift current source transistors  $Q_4$  and  $Q_4'$ . The degeneration resistors  $R_4$  and  $R_4'$  do not introduce a lot of noise because these resistors do not have to be much smaller than the input resistors  $R_1$  and  $R_1'$ . There will be a lot of noise present on the level shift resistors  $R_2$  and  $R_2'$  but this noise will be absorbed on the drains of  $Q_1$  and  $Q_1'$ . Mismatch between  $R_2$  and  $R_2'$  will introduce a small voltage difference between the collectors of  $Q_1$  and  $Q_1'$ , but this will only result in a second order effect on the currents flowing through the collectors of  $Q_1$  and  $Q_1'$ , because the collectors have a relatively high small signal differential impedance.

A disadvantage of the circuit shown in Fig. 15 is the bias current through  $Q_{11}$  and  $Q_1'$  will be input common mode voltage dependent. This means that their transconductance will be common mode dependent as well. Because  $Q_1$  and  $Q_1'$  are the input stage of the amplifier this complicates the frequency compensation of the circuit. To avoid this, the input stage can be degenerated with  $R_5$  and  $R_5'$ .

Figure 16 shows experimental results obtained on a prototype of a test chip based on the principles shown in Fig. 15. This circuit achieves a CMVR of 3–60 Volts when operating from a 5 V supply. Twenty-five test chips were curve traced for their



**Fig. 16** Experimental results achieved on 25 samples of an implementation of the circuit shown in Fig. 15: Input referred offset over common mode, extrapolated offset at 0 V common mode, CMRR and the noise on a single sample versus frequency

input offset voltage as a function of the input common mode voltage. From these curves, the offset and CMRR can be found. These test chips were not trimmed at all. When offset and CMRR trim are added, it is expected that offsets below a few hundred microvolts and common mode rejection ratios in excess of 100 dB can be achieved. The figure also shows the input referred noise density which, as expected, is much better than what can be achieved with the dynamic bridge or dynamic level shift circuits discussed in Section 2.

## 5 Classification of Current Sense Amplifier Topologies

Now that various current sense amplifier topologies have been reviewed, it is instructive to try to classify all topologies based upon their operation principle. The general operating principle of most current sense amplifiers is illustrated in Fig. 17. Typically, current sense amplifiers connect to the input voltage signal through input resistors  $R_1$ . The input resistors carry a differential mode current  $I_{\text{dif}}$  and perhaps a common mode current  $I_{\text{cm}}$ . A large common mode current is necessary to introduce sufficient voltage drop in the resistor based topologies discussed in Section 2. In high voltage topologies, this level shift current is usually not required though typically there will be some common mode bias current through the resistors. The primary goal of the input resistors is to transform the differential input voltage into

**Fig. 17** General operating principle of current sense amplifiers



a differential current. To achieve an accurate voltage to current conversion, the two voltages behind the two input resistors will have to be exactly equal. This ensures that the differential current flowing through the input is exactly equal to  $V_{in}/R_1$ . At the output, some driver is introducing an output voltage  $V_{out}$ . That output voltage is sensed with respect to a reference voltage  $V_{ref}$ . Again, the difference between  $V_{out}$  and  $V_{ref}$  is transformed into a differential current through resistors  $R_3$ . Again, the voltages behind these two resistors should be exactly equal to ensure an accuracy. The circuit in between the resistors should somehow adjust the output voltage using a feedback loop in such a way that the differential current on the input side equals the differential current on the output side. Comparing the currents can easily be implemented by adding them.

In the current sense amplifiers discussed in Section 2, the two differential currents were simply connected together and to the input of the opamp driving the output. In this manner the input and feedback resistors were terminated at a virtual zero, the two currents were compared and the difference was gained up to close the feedback loop. In high voltage technologies, typically additional components were inserted in the center of the schematic shown in Fig. 17. The advantage of high voltage technologies is that active components can be used to introduce current mode isolation between the two voltage domains inside the current sense amplifier. However, the general operating principle remains the same.

An accurate voltage to current conversion can also be implemented using gain boosted cascode transistors. Some topologies discussed in Section 3 used such a stage. Such a gain-boosted V-I convertor stage can be applied at the input side as well as at the output side of the circuit. This line of reasoning eventually results in



**Fig. 18** Classification of current sense amplifiers. Gain boosting on cascode transistors to create an accurate V-I convertor is indicated with an asterisk

the classification presented in Fig. 18. The first topology on the top left is a generic difference amplifier with the high voltage input stage discussed in Section 3.1. The second topology is the high voltage accurate I-V convertor discussed in Section 3.2 which required gain boosting on the PMOS input devices. The third topology uses feedback in front of the high voltage I-V convertor and an accurate low-voltage I-V convertor. The last figure on the top row uses accurate V-I convertors on both the input and output signals and an additional loop amplifier to compare the two differential currents and control the output voltage. This topology is probably not cost effective, but it would work. The two topologies on the bottom are actually not extended CMVR voltage difference amplifiers, but extended CMVR voltage to current convertors that dump current into an external load resistor.

## 6 Conclusions and Future Outlook

Different circuit topologies for extended CMVR current sense amplifiers were reviewed. In low-voltage technologies, the input signal can only be accessed by the circuit through resistors that provide a common mode level shift and this has significant performance disadvantages. Some but not all of these can be mitigated using dynamic matching techniques. High voltage technologies have intrinsic advantages

because they allow a level shift in the current domain across junctions that block the high input common mode voltage. In high voltage technologies, different topologies are possible which were classified according to the locations in which additional feedback loops were used to create virtual zeros in the circuit.

In the future, developments will continue. In battery operated hand held devices power consumption is a prime concern. The shunts will become smaller to reduce the power loss across the shunt. As a result accuracy requirements will continue to increase. This is expected to result in the proliferation of dynamic matching techniques in current sense applications [6].

## References

1. V.A. Vaschenko, W. Kindt and P. Hopper: "High voltage on-chip ESD protection in low-voltage BiCMOS process", Journal of Electrostatics, 2006, vol. 64, no. 2, p. 104
2. J.H. Huisng: "Operational Amplifiers, theory and design", Kluwer Academic Publishers, 2001
3. A.P. Brokaw: U.S. patent 6380387 "Dynamic bridge system with common mode range extension", Nov. 22, 2000
4. W.J. Kindt: U.S. patent 6819170 "Apparatus for sensing differential signals with high common mode levels", Nov. 16, 2004
5. G. V.D. Horn and J.H. Huisng: "Extension of the common-mode range of bipolar input stages beyond the supply rails of operational amplifiers and comparators", IEEE journal of solid state circuits, vol. 28, no. 7, 1993
6. J.F. Witte, J.H. Huisng and K.A.A. Makinwa: "A current-feedback instrumentation amplifier with 5  $\mu$ V offset for bidirectional high-side current sensing", ISSCC conference proceedings 2008

# Low-Voltage Power-Efficient Amplifiers for Emerging Applications

A. López-Martin, R.G. Carvajal, E. López-Morillo, L. Acosta,  
T. Sánchez-Rodríguez, C. Rubia-Marcos and J. Ramírez-Angulo

**Abstract** Various design techniques aimed to obtain low-voltage power-efficient amplifiers are presented. Power efficiency is achieved by employing class-AB stages with high current efficiency based on these techniques. The use of resistive local common-mode feedback and quasi-floating gate transistors is covered in detail. Some applications of the amplifiers designed using these techniques are included.

## 1 Introduction

Emerging applications in various fields, such as Ambient Intelligence scenarios or remote biomedical monitoring, currently demand wireless sensor networks with transceivers having extremely low power consumption requirements. This is a key issue in order to decrease battery weight and size and to increase the lifetime of the battery, which usually in these sensing nodes is not replaceable. To achieve these strict power requirements, several solutions have been proposed at various layers. At the physical layer, savings in power consumption are achieved by low-voltage operation and optimized power-to-performance ratio. Supply voltages of 1 V (or less) are anyway mandatory in modern deep submicron technologies to operate reliably due to the extremely thin oxide. Furthermore reduction of the supply voltage (even if not required) strongly reduces power consumption in digital circuits since it scales with supply voltage. Although this is not so simple in analog circuits, they should operate at the same supply voltage than the digital part in mixed-mode systems to avoid the complexity involved in generating various supply voltages.

The canonic way of designing analog circuits consist in using high-gain amplifiers with passive components in negative feedback loops, both in continuous-time or discrete-time form. Sometimes amplifiers are operated in open loop (e.g. Gm-C filters, some VGAs, etc.), and in this case a large linear range is required for the

---

A. López-Martin (✉)

Department Electrical & Electronic Engineering Public University of Navarra,  
31620 Pamplona, Spain

amplifier at the expense of gain. In any case, amplifiers play a key role in analog design, and their power consumption directly impacts that of the overall analog system. Such amplifiers usually take the form of Operational Transconductance Amplifiers (OTAs) with high output resistance, typically driving capacitive loads, or operational amplifiers with low output resistance able to drive low resistive loads.

Besides low-voltage and power-efficient operation, these amplifiers should have a fast settling response, not limited by slew rate. Conciliating all these requirements is difficult with conventional class A topologies, since the bias current limits the maximum output current. Hence a trade-off between slew rate and power consumption exists [1]. To overcome this issue, class AB topologies are often employed. These circuits provide well-controlled quiescent currents, which can be made very low in order to reduce drastically the static power dissipation. However, they automatically boost dynamic currents when a large differential input signal is applied, yielding maximum current levels well above the quiescent currents.

Several class AB amplifiers have been proposed. Most of them are based on adaptive biasing techniques, by including extra circuitry that increases quiescent currents (e.g. by increasing tail currents in differential stages). However, often the extra circuits included increase both power consumption and silicon area, and add significant parasitic capacitance to the internal nodes. Also positive feedback is often employed to get boosting of dynamic currents, which makes difficult to guarantee stability considering process and temperature variations.

A key issue scarcely considered in the design of class AB amplifiers is power efficiency. This involves not only very low static power dissipation, but also high current utilization [2] also named current efficiency (CE) and defined as the ratio of the maximum load current to the supply current, i.e.,  $CE = I_{out}^{MAX}/I_{supply}$ . This parameter is essential for optimum power management. To achieve high CE, boosting of the dynamic current should take place at the output stage to avoid internal replication of large transient currents in the amplifier. CE is typically below 0.5 for most reported class AB amplifiers with output current mirror ratio B equal to 1 [2], which means that at least half of the supply current is wasted in internal replicas of the differential pair current. In these approaches, the only way to improve current efficiency is to scale the output currents by increasing B. However, this method increases static power dissipation, as quiescent currents at the output branches are also scaled by B. Moreover, parasitic capacitances at the internal nodes increase for large B, reducing phase margin.

In this work we illustrate the use of new circuit design techniques to achieve low-voltage class AB amplifiers that combine simplicity and power efficiency. These techniques allow introducing class AB operation at the input stage and at the active load of the amplifier with minimum penalty in other performance parameters. Section 2 presents the concept of Super Class AB amplifiers and various circuit implementations. As an application, a Sample and Hold circuit is described in Section 3. Section 4 covers the design of class AB amplifiers using quasi-floating gate transistors. Their application in a VGA and a  $\Sigma\Delta$  modulator for wearable electroencephalogram monitoring is described in Section 5.

## 2 Super Class AB Amplifiers

The first approach presented is what we name “Super Class AB” amplifiers [3]. They are single-stage class AB amplifiers that achieve dynamic current boosting at both the differential input stage and at the active load. This double boosting allows very large dynamic currents (ideally proportional to  $V_{id}^4$  with  $V_{id}$  the differential input voltage) and at the same time very high current efficiency since the large dynamic current is generated in the output branch without internal replication. Class AB operation at the input stage is achieved by low-voltage adaptive biasing techniques, while class AB operation in the active load is achieved using resistive Local Common-Mode Feedback (LCMFB) [4].

### 2.1 Principle of Operation

Figure 1 shows how a conventional OTA [Fig. 1(a)] can be converted into a Super Class AB OTA [Fig. 1(b)]. An adaptive biasing circuit provides very small quiescent currents  $I_{BIAS}/2$  to  $M_1$  and  $M_2$ . When such adaptive circuit senses a large differential input, it automatically boosts the bias current provided.

Additional current boosting is obtained by LCMFB via the matched resistors  $R_1$  and  $R_2$ . When no differential input is present, currents  $I_1$  and  $I_2$  in  $M_1$  and  $M_2$  are identical and very small, and no current flows through these resistors. However, upon application of a differential signal, current  $I_R = (I_1 - I_2)/2 = I_d/2$  flows through the resistors, leading to complementary voltage swings at nodes A and B whose maximum value is  $RI_R^{MAX}$ , where  $R = R_1 = R_2$ . If node A has the largest positive swing, it leads to a peak current in  $M_5$  given by

$$I_{MAX} = \frac{\beta_5}{2} (V_c + RI_R^{MAX} - V_{TH})^2 = \frac{\beta_5}{2} \left( \sqrt{\frac{2I_{cm}}{\beta_{6,7}}} + RI_R^{MAX} \right)^2 \quad (1)$$

where saturation is assumed for  $M_5$ ,  $V_c$  is voltage at node C,  $I_{cm} = (I_1 + I_2)/2$ , and  $\beta_i$  is the transconductance gain of transistor  $M_i$ . If the node with the largest positive



**Fig. 1** (a) Class A single-stage amplifier (b) Super Class AB amplifier

voltage swing is B, then  $I_{MAX}$  flows through  $M_8$ . In any case the OTA output current is approximately  $I_{MAX}$ . Hence additional current boosting is achieved if  $R|I_R$  is large enough. Another advantage of LCMFB is the increase in the gain-bandwidth (GBW) product of the OTA. Compared with the topology of Fig. 1(a), it increases in  $g_{m5,8}R_{A,B}$ , where  $R_{A,B} = R||r_{o6,7}||r_{o1,2}$  [3].

## 2.2 Examples of Super Class AB Amplifiers

Different Super Class AB topologies can be obtained using different adaptive biasing techniques in Figure 1(b). Figure 2 shows three alternatives suited for low-voltage operation. Figure 2(a) [5], [6] consists of two matched transistors  $M_1$  and  $M_2$  cross-coupled by two dc level shifters. Under quiescent conditions  $V_{SG1}^Q = V_{SG2}^Q = V_B$ , so transistors  $M_1$  and  $M_2$  carry equal quiescent currents controlled by  $V_B$ . If  $V_B$  is slightly larger than the MOS threshold voltage  $|V_{TH}|$ , very low standby currents can be achieved. However, for instance when  $V_{IN+}$  decreases voltage at the source of  $M_1$  decreases by the same amount whereas the source voltage of  $M_2$  stays constant. Therefore, current through  $M_2$  increases whereas current through  $M_1$  decreases. The maximum swing of these currents can be much larger than the quiescent current. The level shifters must have very low output impedance and also be able to source large currents when the circuit is charging or discharging a large load capacitance. They should be simple due to noise, speed, and supply restrictions. A good choice is shown in Fig. 2(b). Each level shifter is built using two transistors ( $M_3, M_5$  or  $M_4, M_6$ ) and a current source  $I_{BIAS}$ . We name these level shifters “Flipped Voltage Followers” (FVF) [7]. They have a very low output resistance (typically tens of Ohms) and fulfil the aforementioned requirements. Quiescent current in  $M_1$  and  $M_2$  is the well-controlled bias current  $I_{BIAS}$  of the FVF assuming that transistors  $M_1, M_2, M_3$  and  $M_4$  are matched.



**Fig. 2** Adaptive biasing topologies Using two level shifters (a) Diagram (b) Circuit Using CMS (c) Diagram (d) Circuit Using WTA (e) Diagram (f) Circuit

An alternative technique is shown in Fig. 2(c) [8]. A single dc level shifter sets the voltage at the common source node of the input differential pair. This voltage is the common mode voltage of the inputs ( $V_{CM}$ ) shifted by  $V_B$ . Under quiescent conditions,  $V_{SG1}^Q = V_{SG2}^Q = V_B$ . Therefore,  $V_B$  controls the quiescent currents in a similar way as that of Fig. 2(a). When a differential input is applied, an unbalance in the drain current is produced that is not limited by the quiescent current. A very efficient implementation of this level shifter is again the FVF, and the resulting circuit is shown in Fig. 2(d). The FVF bias current  $I_{BIAS}$  is the quiescent current of the input differential pair, assuming matched transistors  $M_1$ ,  $M_2$  and  $M_3$ . A circuit, named CMS in Fig. 2(c), is required to sense the common mode input voltage  $V_{CM}$  and to apply it to the gate of transistor  $M_3$ , thus making quiescent currents independent of the input common mode voltage and leading to a high Common Mode Rejection Ratio (CMRR).

Figure 2(e) shows a modification of the idea in Fig. 2(c), where a Winner-Take-All (WTA) circuit replaces the common mode sensing circuit [3]. The output of the WTA circuit is the maximum (the “winner”) of the input voltages. Therefore, voltage at the common source node of the differential pair is the maximum input voltage  $V_{MAX}$  shifted by the constant voltage  $V_B$ . Under quiescent conditions, input voltages are equal, and their maximum value corresponds to the common mode input voltage.

Increasing  $R_1 = R_2 = R$  in the Super Class AB amplifier increases dc gain, GBW and slew rate. Unfortunately, the maximum value of  $R$  is limited due to stability reasons. It can be shown that phase margin decreases as  $R$  increases [3]. A different approach can be followed, that is illustrated in Fig. 3. It allows increasing  $R$  to achieve high open loop gain and very high slew rate while preserving stability. Although increasing  $R$  decreases the frequency of the internal poles at nodes A and B, a stable behavior is still possible for moderate capacitive loads if phase lead compensation is used at the output node, by means of a series resistor  $R_c$  to create a left half plane zero at a frequency  $\omega_{pz} = 1/(R_c C_L)$ . The zero compensates (partially) for the phase shift of the internal non dominant poles. The circuit of Fig. 3(a) has



**Fig. 3 (a)** OTA with LCMFB and phase lead compensation **(b)** Same OTA but including adaptive biasing at the input stage

a conventional class A input stage, and the circuit of Fig. 3(b) has adaptive biasing following the idea of Fig. 2(c). The difference with the implementation of Fig. 2(d) is that now a folded Flipped Voltage Follower [7] is used instead of a FVF. Input common mode sensing is carried out by a capacitive divider, thus not resistively loading the inputs. The detected input common-mode voltage  $V_{icm}$  is downshifted by the  $V_{GS}$  of  $M_F$  and applied to the common source of transistors  $M_1$  and  $M_2$ . This is a very low impedance node, so  $M_1$  and  $M_2$  form a pseudo-differential pair with drain currents not bounded by the bias current.

### 2.3 Measurement Results

A test chip prototype containing the three Super Class AB OTAs with the adaptive biasing of Figs. 2(b), 2(d), and 2(f) was fabricated in a  $0.5 - \mu\text{m}$  CMOS process, with nominal nMOS and pMOS threshold voltages of about  $0.67\text{ V}$  and  $-0.96\text{ V}$ , respectively. Resistors had values of  $10\text{ k}\Omega$ , and were implemented using interdigitized polysilicon strips. However, linearity is not critical for the resistors. They can be implemented by MOS transistors in triode region as will be shown in Section 2.4, saving active area and with the additional advantage that the resistance value can be readily programmed using a bias voltage [9]. Supply voltages were  $\pm 1\text{ V}$ , and bias current  $I_B$  was set to  $10\text{ }\mu\text{A}$ . Figure 4(a) shows a microphotograph of the chip, where the location and relative area of the three OTAs can be observed.

The transient response the OTAs was measured connecting them in unity-gain configuration, and using a 1-MHz square wave at the input. The output terminal was connected directly to a bonding pad and no external buffer was employed, so the load capacitance corresponds to the pad, breadboard, and test probe capacitance. It is of approximately  $80\text{ pF}$ . Figure 4(b) shows in solid line the output of the proposed OTAs and the output of the conventional class A OTA. The input is the dotted



**Fig. 4** (a) Micrograph of the chip containing the super class AB OTAs (b) Transient response of the OTAs with input stage of Fig. 2b (upper graph), Fig. 2d (middle graph), and Fig. 2f (lower graph)



**Fig. 5** (a) Pulse response of conventional Miller OTA with phase lead compensation (*upper trace*) and proposed circuit of Fig. 3a (*lower trace*).  $C_L$  is 80 pF. Horizontal axis: 1  $\mu$ s/div, vertical axis: 1 V/div. (b) Response of circuits of Fig. 3b (*top trace*), 3a (*middle trace*) and conventional Miller OTA with phase lead compensation (*bottom trace*) for  $C_L = 500$  pF

waveform, almost undistinguishable from the output of the proposed OTAs. Supply voltage, load, and quiescent currents were identical for all the OTAs. The increase in slew rate obtained using the Super Class AB OTAs is of more than two orders of magnitude.

The circuits of Fig. 3 were fabricated in the same technology, along with a conventional Miller OTA with phase lead compensation for comparison. Resistors were  $R = 50\text{ k}\Omega$  and  $R_c = 330\Omega$  (on chip) and unit transistor sizes were  $W/L = 25/1$  and  $60/1$  for PMOS and NMOS transistors respectively.  $I_B$  was  $50\mu\text{A}$ . The circuits were tested in voltage follower configuration with the common drain of  $M_4$  and  $M_8$  connected directly on chip to the negative input terminal. A  $250\text{ kHz}$ ,  $2\text{ Vpp}$  input pulse signal and supply voltages  $V_{DD} = 2.2\text{ V}$ ,  $V_{SS} = -2.2\text{ V}$  were used. Figure 5(a) shows the experimental pulse response of the conventional Miller OTA with phase lead compensation and the circuit of Fig. 3(a) with  $C_L = 80\text{ pF}$ . Slew rates are  $\text{SR} = 1\text{ V}/\mu\text{s}$  for the conventional OTA and  $\text{SR}=16\text{ V/us}$  for the OTA of Fig. 3(a). Figure 5(b) compares the pulse response of the circuits of the conventional OTA and the circuits of Fig. 3 with  $C_L = 500\text{ pF}$ . In this case slew rates had values  $\text{SR} = 10\text{ V}/\mu\text{s}$  for the circuit of Fig. 3(b),  $\text{SR} = 2.5\text{ V}/\mu\text{s}$  for the circuit of Fig. 3(a) and  $\text{SR} = 0.2\text{ V}/\mu\text{s}$  for the class A Miller OTA. This corresponds to maximum output currents with values  $5\text{ mA}$ ,  $1.1\text{ mA}$  and  $100\mu\text{A}$  respectively. These measurements validate the efficient class AB behavior of the proposed structures.

## *2.4 Application Example*

To show a possible application of super Class AB amplifiers, Fig. 6 shows a Sample and Hold (S/H) circuit employing them [9]. It operates in two non-overlapping clock phases,  $\phi_1$  and  $\phi_2$ . During phase  $\phi_1$  switches  $S_1$  and  $S_2$  are closed, and switches  $S_3$  remain open. Therefore, the differential voltage sampled in the capacitors is

**Fig. 6** S/H circuit

$V_{id} - V_{off}$ , with  $V_{off}$  the input offset voltage of the amplifier.  $S_1$  turns off slightly after  $S_2$  for proper operation. During phase  $\phi_2$  switches  $S_1$  and  $S_2$  are open, and switches  $S_3$  are closed, so the differential output voltage is held to the value that  $V_{id}$  had at the end of phase  $\phi_1$  irrespective of the input offset. The output switches are used in our application, an incremental A/D converter, to disconnect the S/H circuit when a valid output is not present.

The Super Class AB OTA used in the S/H circuit of Fig. 6 is shown in Fig. 7(a). As compared to the basic topology in Fig. 1(b), several enhancements are included. First, a fully balanced topology is used, yielding similar positive and negative settling behavior. Second, dc gain is increased by the use of a cascode output stage, leading to higher settling accuracy. Third, resistors are implemented by MOS transistors in the ohmic region ( $M_{13}$  and  $M_{14}$ ), which saves area, allows the use of simpler CMOS processes without high resistance poly layers, and allows adjustment of the resistance value via  $V_{RES}$  to achieve a target phase margin for a given  $C_L$ .

The output common-mode feedback (CMFB) circuit employed is the well-known topology shown in Fig. 7(b).  $V_{OCM}$  is the desired common mode output voltage, and  $V_{BIAS}$  is the nominal bias gate voltage in  $M_5-M_6$ .  $V_{CMF}$  is the voltage applied to the gate of  $M_5-M_6$ . The circuit uses two non-overlapping clock phases. During the first phase, capacitors  $C_1$  sample the voltage  $V_{OCM} - V_{BIAS}$ . During the second one,  $C_1$  is in parallel with  $C_2$ , so  $V_{CMF}$  is updated to keep the output common mode voltage equal to  $V_{OCM}$ .

The S/H circuit has been fabricated in the same  $0.5\text{ }\mu\text{m}$  CMOS technology as the other Super Class AB amplifiers. The total silicon area employed is  $0.075\text{ mm}^2$ . Poly-poly capacitors of  $1\text{ pF}$  were used. The output was connected directly to a bonding pad and an external buffer was employed, so the load capacitance corresponds to the pad, breadboard and input buffer capacitance. The ideal and measured waveforms for a sampling rate of  $31\text{ kHz}$  can be observed. Note that the output is only available during phase  $\phi_2$ . The settling time for this large load capacitance is  $1.9\text{ }\mu\text{s}$ , and the pedestal error is  $900\text{ }\mu\text{V}$ . The measured droop rate is  $-0.75\text{ mV}/\mu\text{s}$ . The quiescent power consumption is only  $80\text{ }\mu\text{W}$  using a dual supply voltage of  $\pm 1.35\text{ V}$ .



**Fig. 7** (a) Super Class AB OTA (b) CMFB circuit



**Fig. 8** Measured transient response of the S/H circuit

### 3 Class AB Amplifiers Based on Quasi-Floating Gate Transistors

In this Section another power-efficient approach for the achievement of class AB operation is presented. Figure 9(a) shows a typical scheme of a class AB amplifying stage. It is based on the use of a floating battery that allows node B to track voltage at node A with a dc level shift  $V_{bat}$ . Under quiescent conditions, the quiescent current is set by voltage at node A and the dc level shift. Under dynamic conditions, signal variations at node A are transferred to node B allowing to provide output currents not limited by the quiescent current. The dc level shift has been implemented in several ways, e.g. using diode-connected transistors or resistors biased by dc currents. The disadvantages of these approaches are that the implementation of the battery requires extra quiescent power consumption and silicon area. Besides the quiescent current is often not accurately set and dependent on process and temperature variations, and the parasitics added by this extra circuitry may limit bandwidth.



**Fig. 9** (a) Basic class AB stage using floating battery (b) Implementation of battery using QFG transistor (c) Possible implementations of  $R_{large}$

#### 3.1 Principle of Operation

Figure 9(b) shows an efficient implementation of this dc level shift using a Quasi-Floating Gate (QFG) transistor [10, 11]. It is a transistor whose gate (node B) is weakly connected to a dc bias current  $V_B$  through a large resistance  $R_{large}$ . The input signal to the gate is applied through a capacitor  $C_{bat}$ . Hence, the quiescent current of the output branch is accurately set to the bias current  $I_B$ , regardless of thermal and process variations as it is set by a current mirror. Under dynamic conditions, voltage at node A is transferred to node B after being high-pass filtered with a cutoff frequency  $1/(2\pi R_{large} C_{bat})$ . Due to the large resistance employed (in the order of GigaOhms) this cutoff frequency is typically below 1 Hz, so in practice only the dc component of voltage at node A is not transferred to node B. The large resistance  $R_{large}$  doesn't need to have a precise value as long as it is high enough to provide a cutoff frequency  $1/(2\pi R_{large} C_{bat})$  lower than the minimum frequency component in node A to be transferred to node B. Hence, process, voltage and temperature variations affecting the value of  $R_{large}$  are not relevant and it can be implemented as shown in Fig. 9(c) by a minimum-size diode-connected MOS transistor in cutoff

region or a minimum-size transistor biased in subthreshold region by another identical transistor, leading to a compact and power-efficient implementation. Note that the implementation of the dc level shifter in Fig. 9(b) does not require additional quiescent power consumption. The increase in silicon area is modest as  $R_{large}$  is made by a minimum-size MOS transistor and  $C_{bat}$  can be small (with the minimum value imposed by the parasitic capacitance at node B).

The positive slew rate of the circuit of Fig. 9(b) is  $SR = I_{max}/C_L$ , where  $I_{max}$  is the maximum output current available and  $C_L$  the load capacitor. For a class A implementation  $I_{max} = I_B$  and therefore  $SR = I_B/C_L$ . In the circuit of Fig. 9(b) when there is no input signal  $I_2 = I_B$  and

$$V_{SG2} = V_{SG2}^Q = \sqrt{\frac{2I_B}{\beta_2}} + |V_{TH2}| \quad (2)$$

where  $V_{TH2}$  and  $\beta_2 = \mu_n C_{ox} (W/L)_{M2}$  are the threshold voltage and transconductance factor, respectively, of transistor  $M_2$ . When a large differential input  $V_{in}$  is applied, this input variation is transferred to the gate of  $M_2$ , leading to a drain current:

$$I_2 = \frac{\beta_2}{2} \left( V_{SG2}^Q + kV_{in} - |V_{TH2}| \right)^2 = \frac{\beta_2}{2} \left( \sqrt{\frac{2I_B}{\beta_2}} + kV_{in} \right)^2 \quad (3)$$

Constant  $k$  is the attenuation due to the capacitive divider formed by  $C_{bat}$  and the parasitic resistance  $C_B$  at node B, and is given by  $k = C_{bat}/(C_B + C_{bat})$ . It is approximately 1 if we choose  $C_{bat}$  larger enough than  $C_B$ . Note from (3) that current  $I_2$  is not bounded by  $I_B$ , reflecting the class AB operation of the followers. For large positive  $V_{in}$  the output current is  $I_{out} \approx I_2$ , and maximum output current amplitude for an input step  $V_{step}$  is given by approximately:

$$I_{MAX} \approx \frac{\beta_2}{2} \left( \sqrt{\frac{2I_B}{\beta_2}} + k \frac{|V_{step}|}{2} \right)^2 \quad (4)$$

which leads to a  $SR$  increase over the class A topology given by:

$$\frac{SR_{AB}}{SR_A} = \frac{I_{MAX,AB}}{I_{MAX,A}} \approx \frac{\beta_2}{2I_B} \left( \sqrt{\frac{2I_B}{\beta_2}} + k \frac{|V_{step}|}{2} \right)^2 \quad (5)$$

The analysis is only approximate as it assumes an ideal MOS I-V square law, but it provides insight about the large-signal class AB operation of the circuit.

The use of this technique to implement a power efficient dc level shift can be applied to nearly any circuit topology requiring to apply such dc level shift to a high-impedance node, be it to achieve class AB operation or for any other purpose. In the following subsections we show some examples.

### 3.2 Two stage QFG Class AB Amplifiers

Figure 10 shows how a conventional two-stage Miller amplifier, shown in Fig. 10(a), can be transformed into a power-efficient class AB amplifier [12] by using the QFG technique proposed for implementing the battery in Fig. 10(b). Figure 10(c) shows the resulting single-ended implementation and Fig. 10(d) the differential version. The fully differential version operates essentially the same way but it features improved rejection to input common-mode and power supply noise and interferences. The CMFB circuit is not shown for simplicity.

Note that the only difference of the class AB amplifier of Fig. 10(c) with the conventional version of Fig. 10(a) is that the output transistor  $M_7$  is dynamically biased using the circuit of Fig. 9(b). Hence the output quiescent current has the same value  $I_B$  as the class A version of Fig. 10(a). The quiescent current in  $M_6$  is also  $I_B$  if the  $W/L$  is twice that of  $M_3$  and  $M_4$ . In dynamic operation, when the output of the amplifier is slewing, voltage at node A experiences a large swing which is translated to node B since capacitor  $C_{bat}$  cannot modify its charge rapidly through  $M_{Rlarge}$ . Hence class AB (push-pull) operation of the output stage is achieved.

Depending on the compensation and load capacitors, the slew rate of the class AB amplifier can be limited by the first stage due to the limited current delivered to  $C_C$ . In this case a low-voltage class AB differential input stage as these shown in Fig. 2 can be used in order to achieve high slew rate at both the internal node and the output node of the op-amp.



**Fig. 10** (a) Conventional class A two stage op-amp (b) Conceptual scheme of two stage op-amp with class AB output stage using floating battery (c) Implementation of class AB output stage using QFG technique (d) Fully differential class AB op-amp

### 3.3 Measurement Results

The circuits of Fig. 10(a) and Fig. 10(c) were fabricated in the same  $0.5\mu\text{m}$  CMOS technology as in Section 2.3. The following transistor sizes (in  $\mu\text{m}$ ) were used:  $M_1, M_2 : 30/1$ ,  $M_B, M_3, M_6 : 60/1$ ,  $M_4, M_5 : 10/1$ ,  $M_7 : 20/1$ ,  $M_{Rlarge} : 2/1$ ,  $C_{bat} = 3\text{ pF}$ ,  $C_c = 1\text{ pF}$ ,  $R_c = 10\text{ k}\Omega$ . Figure 11(a) shows the microphotograph of the fabricated chip. The area of the class AB op-amp is  $195 \times 63\text{ }\mu\text{m}^2$ . The circuits were tested with a single supply  $V_{DD} = 2\text{ V}$ , a bias current  $I_B = 10\mu\text{A}$  and  $C_L = 25\text{ pF}$ . The measured open loop ac response of the class A and class AB amplifiers is shown in Fig. 11(b). As expected the dc gain is the same, nearly 45 dB. Also the gain-bandwidth product is the same, about 11 MHz. However, note that the unity-gain frequency is 6 MHz for the circuit of Fig. 10(a) and 11 MHz for the circuit of Fig. 10(b). This is because the non-dominant output pole in the circuit of Fig. 10(a) is at lower frequencies.

Figure 12(a,b) show the measured input and output waveforms of the op-amps of Figs. 10(a) and 10(c) respectively, for a 250 kHz 1 Vpp input square waveform.



**Fig. 11** (a) Micrograph of the class A and class AB amplifiers (b) Measured open-loop frequency response of the class A and class AB amplifiers



**Fig. 12** Measured pulse response input and output waveforms (a) Conventional class A amplifier (b) Class AB amplifier

The corresponding measured slew rates were  $0.41 \text{ V}/\mu\text{s}$  and  $20 \text{ V}/\mu\text{s}$ , respectively. Hence slew rate enhancement factor is of approximately 50. The large overshoot in the response of the class A op-amp is an indication of its reduced phase margin with respect to the class AB op-amp.

### 3.4 Application Examples

Two applications of the QFG class AB amplifiers will be presented. The first one is a Variable Gain Amplifier (VGA). The second one is a second-order Sigma-Delta modulator for Electroencephalogram (EEG) applications, achieving a resolution of 10 bits over a bandwidth of 25 Hz using 1.2 V of supply voltage and only 160 nW of power consumption.

#### A. Variable Gain Amplifier.

Figure 13(a) shows a linear OTA in combination with an amplifier operating as transresistance amplifier. The circuit is based on the Cherry-Hooper amplifier [13] and behaves as a fully differential CMOS VGA with programmable and accurate gain and high bandwidth that remains approximately constant with gain adjustment. The OTA output is connected to a virtual ground, allowing high bandwidth and relaxing OTA requirements on high output impedance and output swing. Besides bandwidth of the amplifier is approximately its GBW if  $R_F \ll R_L$ .



**Fig. 13** (a) Proposed VGA (b) Implementation of the tunable transconductor

The linear OTA transforms the input voltage  $V_{id}$  into a current  $I = G_m V_{id}$  with  $G_m = 1/R_{tun}$ . This current is converted to voltage by the transresistance amplifier yielding an output voltage  $V_{od} = R_F I = (R_F/R_{tun})V_{id}$ . Gain can be varied by changing the adjustable resistance  $2R_{tun}$  in the linear OTA, which is done by modifying the dc gate control voltage of two series nMOS transistors in triode region that implement such resistance. The linear OTA is shown in Fig. 13(b). Local feedback is used to transfer accurately the input voltage to the tunable resistor, where it is converted to a current  $I_R = V_{id}/(2R_{tun})$  which is conveyed to the output by a folding stage. The amplifier is the Super Class AB circuit of Fig. 10(d).

Post-layout simulation results of the circuit in the same CMOS technology used in the other Sections confirm that bandwidth remains constant at 47 MHz for a gain variation in two decades (1–100) and  $C_L = 15 \text{ pF}$ . SFDR is 64 dB for a gain of 10, an output voltage of 1 Vpp at 1 MHz, and a 3.3 V supply voltage. A breadboard implementation using a CA3280 bipolar OTA and an AD823 op-amp with  $\text{GBW} = 16 \text{ MHz}$  also verify the circuit. Dual supply voltages  $\pm 5 \text{ V}$ ,  $R_F = 1.6 \text{ k}\Omega$  were used. The bandwidth of the VGA is approximately constant (10.2 MHz) for gains from 1 to 22.

### B. Second-order Sigma-Delta modulator

There is an emerging scenario in the field of remote sensing of biomedical signals which requires portable and wearable ultra low-power equipment. This trend is especially significant for long-term EEG (Electroencephalogram) monitoring for epilepsy and other neurological diseases. This equipment allows to implement a remote EEG monitoring for an extended period of time without requiring the presence of the person in the medical facility.

The basic block diagram of a low-voltage low-power front-end suitable for a wearable EEG system is shown in Fig. 14. The input signal from the scalp electrodes is very slow (0.2–25 Hz), and shows a very low frequency drift due to the variation with time of the impedance of the electrodes. The instrumentation amplifier increases the signal level by 40 dB and at the same time filters out these undesired very low frequency variations. Flicker noise is reduced by a chopper technique. Then an antialiasing filter removes the input signal spectral components below half the Nyquist frequency (i.e. below  $50 \text{ Hz}/2=25 \text{ Hz}$ ) of the subsequent 10-bit ADC. Finally the signal is digitized and further processed in digital form. Very low voltage and extremely low power are required in the front-end to make the system truly wearable. To achieve this goal, the design of the ADC in the front-end is critical.



**Fig. 14** Front end of wearable EEG system



**Fig. 15** Second-order Discrete-Time Sigma-Delta Modulator

A simple and power efficient architecture for high resolution ADCs in the range of biomedical signals is the classical second-order  $\Sigma\Delta$  modulator (Fig. 15). The architecture is simple, robust against component mismatch and can be made unconditionally stable. Behavioral simulations show a Dynamic Range (DR) of 72 dB with an oversampling ratio of 64, which is enough to achieve the 10 bit resolution. In order to get an unconditionally stable modulator and to maximize the integrators output swing, the coefficients have been chosen as  $a_1 = b_1 = b_2 = 0.25$  and  $a_2 = 0.5$ .

Figure 16 shows the SC implementation of the modulator of Fig. 15. The architecture is composed of two integrators, a comparator and a 1-bit digital-to-analog converter. Correlated Double Sampling (CDS) has been used in the integrators in order to remove offset and to shape the flicker noise outside the signal band. The CDS integrators operate as follows: During clock phase  $\phi_1$  the amplifier flicker noise and offset is sampled across  $C_{CDS}$ . During clock phase  $\phi_2$  the flicker noise an offset are cancelled by the voltage stored in  $C_{CDS}$ .

To achieve the required low voltage and low power operation, the fully differential class AB QFG amplifier of Fig. 10(d) has been used. Due to the low bandwidth required, transistors are biased in weak inversion to minimize power consumption. Bias current and transistors sizes are set to enforce the settling requirements. At



1.2 V supply and a bias current of 3 nA, the op-amp achieves a simulated low frequency gain of 90 dB, a unity-gain bandwidth of 30 kHz and a slew rate of 10 kV/ms with 150 fF load capacitor. The dissipation is 4 nA for the input stage and 8 nA for the output stage.

Another issue in the Sigma-Delta modulator of Fig. 16 is the design of the switches to achieve wide dynamic range using only 1.2 V of supply voltage. Rail-to-rail switches are required to maximize such dynamic range. As the supply voltage is low, not enough overdrive is provided to the gates of transistors used as switches to be turned on over the whole signal range. To solve this limitation, the QFG technique is employed also in the switches. Figure 17 shows a low voltage analog switch based on two QFG transistors [14]. The two complementary QFG transistors,  $M_{passN}$  and  $M_{passP}$ , are connected in series in order to get a rail-to-rail operation. The gate of  $M_{passN}$  is weakly tied to  $V_{DD}$  through a large non-linear resistor implemented by transistor  $M_{Rlarge1}$ . The gate is also coupled to the clock signal through a small valued capacitor,  $C_1$ , so that the clock signal is transferred to the quasi-floating gate. The capacitor performs a level shift of approximately  $V_{DD}$ , which allows switching under very low-voltage restrictions. Note that the switch implemented by  $M_{passN}$  is not rail-to-rail because it is not possible to turn off the transistor for input signals near the negative rail. The rail-to-rail operation is achieved thanks to QFG transistor  $M_{passP}$ . The gate of  $M_{passP}$  is weakly tied to the negative rail through a pMOS transistor, which acts as a very large ( $M_{Rlarge2}$ ) voltage-dependent resistor. A complementary clock signal is coupled to the gate of  $M_{passP}$ . Now, when the switch is off, transistor  $M_{passP}$  will be in cutoff for input signals near the negative rail, whereas transistor  $M_{passN}$  will be in cutoff for input signals near the positive rail. Therefore, rail-to-rail operation is achieved.

The Sigma-Delta modulator has been implemented using the same 0.5  $\mu$ m CMOS technology mentioned in previous Sections. The chip microphotograph is shown in Fig. 18(a). A sinusoidal input of 5 Hz and 362 mVpp has been used to characterize the dynamic performance. Figure 18(b) shows the output spectrum of the modulator, featuring a SNDR of 60.82 dB. Under these conditions, the power consumption is only of 160 nW. The measured dynamic range is 67.4 dB, which corresponds to 10.75 effective bits.



**Fig. 17** QFG rail-to-rail switch



**Fig. 18** (a) Micrograph of the  $\Sigma\Delta$  modulator (b) Measured output spectrum with an input signal of 362 mVpp and 5 Hz

## 4 Conclusions

Novel amplifier topologies able to operate with low supply voltage have been presented. Operation in class AB is obtained with simple techniques, like resistive local common-mode feedback and quasi-floating gate transistors. This simplicity leads to an efficient use of the power supplied. Application examples of the proposed amplifiers have been shown to illustrate potential fields where the proposed techniques are of interest.

**Acknowledgments** This work has been funded in part by the Spanish MEC and FEDER under grant TEC2007-67460-C03.

## References

1. K. de Langen and J.H. Huijsing, "Compact low-voltage power-efficient operational amplifier cells for VLSI," IEEE JSSC, Vol. 33, No. 10, October 1998, pp. 1482–1496.
2. R. Harjani, R. Heineke, and F. Wang, "An integrated low-voltage class AB CMOS OTA," IEEE JSSC, Vol. 34, No. 2, Feb. 1999, pp. 134–142.
3. A.J. Lopez-Martin, S. Baswa, J. Ramirez-Angulo, and R.G. Carvajal, "Low-voltage super class AB CMOS OTA cells with very high slew rate and power efficiency," IEEE JSSC, Vol. 40, May 2005, pp. 1068–1077.
4. J. Ramirez-Angulo and M. Holmes, "Simple technique using local CMFB to enhance slew rate and bandwidth of one-stage CMOS op-amps," Electronics Letters, Vol. 38, No. 23, Nov. 2002, pp. 1409–1411.
5. V. Peluso, P. Vancorenland, M. Steyaert, and W. Sansen, "900 mV differential class AB OTA for switched opamp applications," Electronics Letters, Vol. 33, No. 17, August 1997, pp. 1455–1456.
6. S. Baswa, A.J. Lopez-Martin, J. Ramirez-Angulo and R.G. Carvajal, "Low-voltage micropower super class AB CMOS OTA," Electronics Letters, Vol. 40, No. 4, Feb. 2004, pp. 216–217.
7. R.G. Carvajal, J. Ramirez-Angulo, A.J. Lopez-Martin, A. Torralba, J. Galán, A. Carlosena, and F. Muñoz, "The Flipped Voltage Follower: a useful cell for low-voltage low-power circuit design," IEEE TCAS-I, Vol. 52, No. 7, July 2005, pp. 1276–1291.

8. S. Baswa, A.J. Lopez-Martin, R.G. Carvajal, and J. Ramirez-Angulo, "Low-voltage power-efficient adaptive biasing for CMOS amplifiers and buffers," *Electronics Letters*, Vol. 40, No. 4, Feb. 2004, pp. 217–219.
9. A.J. Lopez-Martin, C.A. De La Cruz, X. Ugalde, R.G. Carvajal, and J. Ramirez-Angulo, "Micropower CMOS S&H circuit for Ambient Intelligence applications," *Electronics Letters*, Aug. 2005, pp. 935–936.
10. A. Baschiroto and G. Frattini, "AC-coupled driver with wide output dynamic range," U.S. Patent 6163176, Dec. 19, 2000.
11. J. Ramirez-Angulo, A.J. Lopez-Martin, R.G. Carvajal, and F.M. Chavero, "Very low-voltage analog signal processing based on quasi-floating gate transistors," *IEEE JSSC*, Vol. 39, No. 3, Mar. 2004, pp. 434–442.
12. J. Ramirez-Angulo, R.G. Carvajal, J.A. Galan, and A. Lopez-Martin, "A free but efficient low-voltage class-AB two-stage operational amplifier," *IEEE TCAS-II*, Vol. 53, No. 7, July 2006, pp. 568–571.
13. E.M. Cherry and D.E. Hooper, "The design of wideband transistor feedback amplifiers," *Proc. IEE*, Vol. 110, Feb. 1963, pp. 375–389.
14. F. Munoz, J. Ramirez-Angulo, A. Lopez-Martin, R. G. Carvajal, A. Torralba, B. Palomo, and M. Kachare, "Analogue switch for very low-voltage applications", *Electron. Letters*, Vol. 39, No. 9, May 2003, pp. 701–702.
15. A. Baschiroto and G. Frattini, "AC-coupled driver with wide output dynamic range," U.S. Patent 6163176, Dec. 19, 2000.

# Integrated Amplifier Architectures for Efficient Coupling to the Nervous System

Timothy Denison, Gregory Molnar and Reid R. Harrison

**Abstract** Monitoring the electrical activity of multiple neurons in the brain could enable a wide range of scientific and clinical endeavors. An enabling technology for neural monitoring is the interface amplifier. Current amplifier research is focused on two paradigms of chronically sensing neural activity: one is the measurement of ‘spike’ signals from individual neurons to provide high-fidelity control signals for neuroprostheses, while the other is the measurement of bandpower fluctuations from cell ensembles that convey general information like the intention to move. In both measurement techniques, efforts to merge neural recording arrays with integrated electronics have revealed significant circuit design challenges. For example, weak neural signals, on the order of tens of microvolts rms, must be amplified prior to analysis and are often co-located with frequencies dominated by  $1/f$  and popcorn noise in CMOS technologies. To insure the highest fidelity measurement, micropower chopper stabilization is often required to provide immunity from this excess noise. Another difficulty is that strict power constraints place severe limitations on the signal processing, algorithms and telemetry capabilities available in a practical system. These constraints motivate the design of the interface amplifier as part of a total *system-level* solution. In particular, the system solutions we pursued are driven by the key neural signal of interest, and we use the characteristics of the neural code guide the partitioning of the signal chain. To illustrate the generality of this design philosophy, we discuss state-of-the-art design examples from a spike-based, single-cell system, and a field potential, ensemble neuronal measurement system, both intended for practical and robust neuroprostheses applications.

## 1 Introduction to Neural Sensing

The measurement of neurophysiological activity spans a range of modalities for applications ranging from seizure monitoring to motor neuroprostheses. As illustrated in Fig. 1, neuronal activity can be measured with a number of techniques, ranging

---

T. Denison (✉)

Medtronic Neuromodulation Technology, Minneapolis, MN 55410, USA



**Fig. 1** Relative comparisons of the three primary neural recording technologies including estimates of spatial resolution, bandwidth and signal levels

in resolution from single cell recording of action potentials to the measurement of gross cortical activity with the surface electroencephalogram (EEG). Each technique has its trade-offs. Single-cell recording provides the highest spatial resolution, but at the cost of increased amplifier power, the need for pre-processing of information prior to telemetry, and challenging requirements for chronic electrode-tissue interface stability. EEG provides the least invasive recording method, but at the expense of small signals subject to artifacts and limited spatial-temporal resolution. The measurement of ensemble activity around an electrode, called local field potentials (LFPs), provides a compromise between power dissipation and spatial resolution, but signals can be smaller than spikes and reside in a region often dominated by  $1/f$  or popcorn noise. In practice, the choice of a particular measurement approach is a balance of several system constraints, including the measurement electrode's spatial resolution, the desired neurophysiological information content, and the power requirements for sensing, algorithm/control and telemetry. Finding the proper balance between signal coding and technical trade-offs is key to building practical neuroprosthetic applications. For a survey of these technologies and applications, we refer the readers to 1-28.

To address the breadth of state-of-the-art neural recording techniques, this chapter is organized into two sections. The first section discusses the design of single-cell “spike-based” systems that are used for prosthesis requiring fine motor control. The second describes amplifiers targeting field potentials that are useful for ‘simpler’ prosthetic systems like a cursor controller, and monitoring diseases of gross neuronal activity like seizures and movement disorders.

## 2 Neural Spike Amplification

The measurement of neural spikes provides a direct linkage into the single-cell coding of the brain. This high fidelity neural decoding can be used in applications like an arm prosthetic by decoding neurons from an electrode array placed in the motor cortex to servo a robot arm [12]. To be practical, the spike-based system design requires careful balancing of signal amplification, power, and data compression.

### A. Design Requirements

Due to the small amplitude of neural signals recorded extracellularly and the high impedance of the electrode-tissue interface, amplification must be performed before these signals can be digitized or analyzed in any way. An integrated front-end amplifier for neural signals must:

- (1) have sufficiently low input-referred noise to resolve spikes as small as  $30\text{ }\mu\text{V}$  in amplitude;
- (2) have sufficient dynamic range to convey spikes or LFPs as large as  $\pm 1\text{--}2\text{ mV}$  in amplitude;
- (3) have much higher input impedance than the electrode-tissue interface and have negligible dc input current;
- (4) amplify signals in the frequency bands of interest (roughly 300–5 kHz for spikes and 10–200 Hz for local field potentials);
- (5) block dc offsets present at the electrode-tissue interface to prevent saturation of the amplifier; and
- (6) consume little silicon area, and use few or no off-chip components to minimize size.

In addition to these requirements, the amplifier should have a high common-mode rejection ratio (CMRR) to minimize interference from 50/60 Hz power line noise, and a high power-supply rejection ratio (PSRR) if power supply noise is significant (e.g., from ac inductive power links). Arrays of amplifiers should have low crosstalk between channels.

To reduce pickup of 50/60 Hz noise, microphonics, and other capacitively- and inductively-coupled interferers, the distance between electrode and amplifier should be minimized. Additionally, tethering forces introduced by wires cause problems for electrode inserted into the soft, pliable brain tissue. Thus, the amplifiers are ideally attached directly to the electrodes very near the recording site. This proximity of the electronics to living tissue imposes strict limits on the amount of power that can be dissipated by the circuitry; if cells are exposed to elevated temperatures for extended periods of time, they will die [9, 10]. Thus, we add another requirement for neural signal amplifiers: operation at low power levels to minimize tissue heating.

The precise limits to power dissipation in implanted devices can be difficult to establish. Most devices are designed to limit the chronic heating of surrounding tissue to less than  $1^\circ\text{C}$ . Thus, the size and shape of a device determine its power limits. Preliminary experiments have shown that an implanted cortical 100-electrode array with integrated electronics measuring roughly  $6\text{ mm} \times 6\text{ mm} \times 2\text{ mm}$  can safely dissipate approximately  $10\text{ mW}$  of power [29, 30]. This power limit poses a challenge for high-channel-count recording systems since each electrode requires a dedicated low-noise amplifier.

A rough order-of-magnitude analysis of multi-channel neural recording devices presents a sobering picture for circuit designers: with modern MEMS arrays providing approximately 100 electrodes and a power dissipation limit of  $10\text{ mW}$ , each channel must consume less than  $100\text{ }\mu\text{W}$ , and this does not even include shared

resources on a chip such as A/D conversion, power regulation, control, and telemetry circuits.

### B. Circuit Architecture and Design Techniques

Figure 2 shows the schematic of a neural signal amplifier that was first described in [32]. The amplifier is based around an operational transconductance amplifier (OTA) that produces a current proportional to the differential voltage applied to its inputs, where  $G_m$  is the constant of proportionality. A capacitive feedback network consisting of  $C_1$  and  $C_2$  capacitors sets the midband gain of the amplifier. ( $C_{in}$  models the input capacitance of the OTA, as well as any bottom-plate capacitance from  $C_1$  and  $C_2$ .) The input is capacitively coupled through  $C_1$ , so any dc offset from the electrode-tissue interface is removed.  $C_1$  should be made much smaller than the electrode impedance to minimize signal attenuation.

The  $R_2$  elements shown in the feedback loop represent lossy elements that set the low-frequency amplifier cutoff; they may be implemented using real resistors, but the MOS-bipolar element used in [32] provides an area-efficient means of creating a small-signal resistance of  $>10^{12} \Omega$  for low-frequency operation (i.e., LFPs). The long time constant associated with this pole can cause the amplifier to recover slowly from large transients, so the  $M_{FS}$  transistors can act as switches to implement a ‘fast settle’ function.

Figure 3(a) shows a gain vs. frequency plot for the neural amplifier in Fig. 2. The approximate transfer function is given by

$$\begin{aligned} \frac{v_{out}}{v_{in+} - v_{in-}} &= \frac{C_1}{C_2} \cdot \frac{1 - s C_2 / G_m}{\left(\frac{1}{s R_2 C_2} + 1\right) \left(s \frac{C_L C_1}{G_m C_2} + 1\right)} \\ &= A_M \frac{1 - s / (2\pi f_z)}{\left(\frac{2\pi f_L}{s} + 1\right) \left(\frac{s}{2\pi f_H} + 1\right)} \end{aligned} \quad (1)$$



**Fig. 2** Schematic of OTA-based neural signal amplifier with capacitive feedback

**Fig. 3** (a) Log-log plot of gain vs. frequency for the neural amplifier shown in Fig. 2. (b) Log-log plot of neural amplifier output noise vs. frequency



The midband gain  $A_M$  is set by the capacitance ratio  $C_1/C_2$ , and the gain is flat between the lower and upper cutoff frequencies  $f_L$  and  $f_H$ . The lower cutoff frequency is determined by the product of  $R_2$  and  $C_2$ , while the upper cutoff is determined by the load capacitance  $C_L$ , the OTA transconductance  $G_m$ , and the midband gain. Capacitive feedthrough introduces a right-half-plane zero at  $f_z$ . This zero can be pushed to very high frequencies (higher than secondary poles due to parasitic capacitances in the OTA) by setting

$$C_2 \ll \sqrt{C_1 C_L} \quad (2)$$

so that it has little practical effect on amplifier operation.

The thermal noise sources in the neural amplifier are shown in Fig. 2 as voltage sources  $v_{nia}$  and  $v_{nR}$ . The source  $v_{nia}$  models the input-referred voltage noise of the OTA. The two  $v_{nR}$  sources model the thermal noise (or Johnson noise) contributed by the resistive  $R_2$  elements in the feedback loop. If both  $v_{nia}$  and  $v_{nR}$  are taken to be white (i.e., ignoring 1/f noise), their contributions to the total amplifier output noise are shown in Fig. 3(b). The OTA contributes noise primarily between  $f_L$  and  $f_H$ . Below a particular frequency, the noise contribution from  $v_{nR}$  will dominate; we denote this frequency  $f_{corner}$ . If  $R_2$  is implemented as a real resistor so that its noise spectral density is

$$v_{nR}^2(f) = 4kT R_2 \quad (3)$$

and  $C_1 \gg C_2, C_{\text{in}}$ , then  $f_{\text{corner}}$  is approximately

$$f_{\text{corner}} \approx \sqrt{\frac{3C_L}{2C_1} f_L f_H}. \quad (4)$$

(A similar result is obtained for the MOS-bipolar element used as  $R_2$  in [32].) To minimize the noise contribution from the  $R_2$  elements, we should ensure that  $f_{\text{corner}} \ll f_H$ . For resistive  $R_2$  elements, this can be accomplished by designing the amplifier so that

$$\frac{C_L}{C_1} << \frac{2}{3} \frac{f_H}{f_L}. \quad (5)$$

In practical circuits, the  $1/f$  noise from the OTA may dominate the noise contributed by the  $R_2$  elements. However, if multi-transistor, amplifier-based circuits are used as  $R_2$  feedback elements, the increased thermal noise from these circuits may masquerade as increased  $1/f$  noise as shown in Fig. 3(b).

If the noise contribution from  $R_2$  is negligible (i.e.,  $f_{\text{corner}} < < f_H$ ) and  $C_1 >> C_2$ ,  $C_{\text{in}}$ , then the output rms noise voltage of the neural amplifier in Fig. 2 is dominated by the noise from the OTA. Thus, the design of the OTA is crucial to minimize the overall noise of the neural amplifier. We use a cascaded current-mirror OTA as shown in Fig. 4, but other topologies such as a folded cascode amplifier would work as well. The input-referred thermal noise spectral density of this OTA is given by

$$v_{nia}^2(f) = \frac{16kT}{3g_{m1}} \left( 1 + 2\frac{g_{m3}}{g_{m1}} + \frac{g_{m7}}{g_{m1}} \right) \quad (6)$$

where  $g_{m1}$  is the transconductance of the input devices  $M_1$  and  $M_2$ ,  $g_{m3}$  represents the transconductance of the nMOS current mirror devices  $M_3$ – $M_6$ , and  $g_{m7}$  represents the transconductance of the pMOS current mirror devices  $M_7$  and  $M_8$ . The



**Fig. 4** Schematic of operational transconductance amplifier (OTA) used in the neural amplifier shown in Fig. 2

biasing transistors ( $M_{M1}$  and  $M_{M2}$ ) and the cascode transistors ( $M_{C1}$  and  $M_{C2}$ ) contribute negligible noise.

As described in [32], the input-referred noise of this OTA can be minimized by ensuring that  $g_{m1} \gg g_{m3}, g_{m7}$ . This is accomplished by sizing the transistors so that  $M_1$  and  $M_2$  operate in weak inversion where the ratio of device transconductance to drain current ( $g_m/I_D$ ) is maximum and  $M_3$ – $M_8$  operate deep in strong inversion where  $g_m/I_D$  is greatly reduced [33–36].

Perhaps the most critical tradeoff in neural amplifier design is that between power dissipation and input-referred noise. A dimensionless figure of merit that captures the essence of this tradeoff clearly is the noise efficiency factor (NEF), first proposed in [31]:

$$\text{NEF} \equiv V_{ni,\text{rms}} \sqrt{\frac{2I_{\text{tot}}}{\pi \cdot U_T \cdot 4kT \cdot \text{BW}}} \quad (7)$$

where  $I_{\text{tot}}$  is the total amplifier supply current,  $U_T$  is the thermal voltage  $kT/q$ , BW is the amplifier bandwidth, and  $V_{ni,\text{rms}}$  is the amplifier's input-referred rms voltage noise. An amplifier with noise contributed only by the thermal noise of a single ideal bipolar transistor has an  $\text{NEF} = 1$ ; all physical circuits have  $\text{NEF} > 1$ . In [32], we demonstrated that the NEF of CMOS neural amplifiers is minimized by selectively operating transistors in weak or strong inversion as described above.

Figure 5 shows a photograph of a 100-channel neural recording system with integrated ADC and wireless RF telemetry. The chip measures  $4.7 \text{ mm} \times 5.9 \text{ mm}$  after fabrication in a  $0.5 - \mu\text{m}$  2-poly, 3-metal CMOS process. Each amplifier fits into a layout area of  $400 \mu\text{m} \times 400 \mu\text{m}$  so that it may be flip-chip bonded to the back of a Utah Electrode Array for complete integration. The amplifiers on this chip were designed for an input-referred noise of  $5 \mu\text{V}_{\text{rms}}$  to reduce the required layout area. Since the layout area of neural amplifiers is typically dominated by capacitance and  $C'$  for linear capacitors does not scale dramatically in deep submicron processes, moving to smaller processes results in modest area savings.

**Fig. 5** Photograph of 100-channel neural recording integrated circuit. The chip measures  $4.7 \text{ mm} \times 5.9 \text{ mm}$  and includes an ADC, spike detectors, and a wireless RF telemetry system [8]



### C. Signal Digitization

To permit the robust transmission of neural data across a wireless channel, the amplified neural signals must be converted into a digital representation. Figure 6 shows a variety of techniques for performing this digitization. In all cases, a preamplifier must be used first to boost the microvolt-level electrode signal and dramatically lower the driving impedance.

The most straightforward technique for digitizing the neural signal is to pass the wideband amplified signal through an analog-to-digital converter (ADC), as shown in Fig. 6(a). If LFP information is not needed, it can be eliminated with a high-pass filter prior to digitization, as shown in Fig. 6(b). Most commercial neural recording equipment (mounted in large rack-mount cases and supplied by ac wall power) operates in one of these modes using sampling rates of approximately 30 kS/s with resolutions of 12–16 bits (e.g., [38]). These systems thus produce data rates of 36–48 Mb/s from a 100-electrode array. A reduced sampling rate of 15 kS/s and resolution of 10 bits [as shown in Fig. 6(a) and (b)] is sufficient for most scientific and clinical applications, but this still yields a data stream of 15 Mb/s for 100 electrodes.

Transmitting data at these rates over a wireless transcutaneous link is difficult or impossible to achieve in small, implanted systems that are severely power constrained. RF links are handicapped by the fact that the tissue absorption of electromagnetic radiation follows an  $f^2$  trend. Infrared light penetrates bone and tissue with little attenuation, but optical links require a fair amount of power. Recently, transcutaneous data transfer at 40 Mb/s was demonstrated, but the power consumption of the transmitter was 120 mW [39]. Clearly, implantable high-channel-count neural recording devices will likely require circuitry for on-chip data compression.



**Fig. 6** Block diagram showing six different techniques for digitizing various aspects of a neural signal

### 3 Efficient Signal Processing: Adaptive Neural Spike Detection

When one considers the nature of typical neural signals, it is clear that far too much information is being transmitted in Fig. 6(a) and 6(b). For many scientific and neuromimetic applications, the only relevant information is the presence and timing of action potentials to an accuracy of approximately 1 ms. Detecting the presence or absence of a spike every 1 ms produces a 100 kb/s data stream for a 100-electrode system. This data rate could be reduced even further by the use of an asynchronous protocol that transmits data only when spikes appear (e.g., [40]). Cortical neurons exhibit firing rates around 10 Hz, and in a 100-channel system, the “address” of each spike can be encoded in a 7-bit number representing its electrode of origin. If we transmit an address only when a spike occurs, our data rate can be reduced to an average of 7 kb/s. A system described in [7] sends the address of spikes and uses a 5-bit ADC to transmit the amplitude of the spike as well.

The remaining problem is how to perform this data reduction from noisy analog waveform to identified spikes in a small, low-power device. The amplitude of spikes recorded extracellularly can vary widely from one electrode to the next depending on the relative position and orientation of the recording site and the cell carrying the impulse. Additionally, background noise caused by distant neural activity, electrode noise, and electronic noise in the preamplifier can vary with time, temperature, and electrode position.

A straightforward technique shown in Fig. 6(c) is to set a spike-detection threshold manually using a digital-to-analog converter (DAC). This technique has been implemented in a 100-channel wireless neural recording system that transmits one user-selectable channel using the technique shown in Fig. 6(b), while spike data from all 100 channels are transmitted using manual spike thresholding [8]. The ADC allows the user to observe the waveform from each electrode in turn and set an appropriate spike-detection threshold using local DACs.

In the future, it would be advantageous for the implanted device to autonomously set spike detection thresholds for each channel. In pursuit of this goal, we developed a small mixed-signal circuit to adaptively set spike detection thresholds above a background noise level.

#### *A. Adaptive Spike Detection Algorithm*

The goal of our spike-detection algorithm (first described in [41]) is to adaptively set a detection threshold that is low enough to capture action potentials, but high enough to reject occasional peaks in the background noise. We assume Gaussian background noise having a mean of zero. (Measured background noise from actual neural recordings has a roughly Gaussian distribution, though the tails are slightly wider [42].) Therefore the noise is entirely described by its rms value, which is equivalent to its standard deviation,  $\sigma$ . If we can measure the rms level  $\sigma$  of the background noise, we can set a threshold to some multiple of  $\sigma$  and reject all but a vanishingly small fraction of the background noise. For example, with a threshold of  $5\sigma$ , the probability of Gaussian noise triggering the spike detector is approximately  $3 \times 10^{-7}$ .

**Fig. 7** If Gaussian noise (top) is passed through a comparator having a threshold set to the rms value of the noise (dotted line), the resulting digital signal (bottom) made up of zeros and ones has a dc level of 0.159 (dotted line)



To develop a simple method for measuring  $\sigma$ , we observe that if a threshold is set at  $\sigma$ , the probability of Gaussian noise exceeding this threshold is 0.159. Figure 7(a) shows a noise waveform and a threshold level of  $1\sigma$ . After comparing the noise with this threshold, we get a digital waveform having a duty cycle (i.e., the fraction of time the waveform is high) of 0.159 [see Fig. 7(b)]. The duty cycle is proportional to the dc level of this digital waveform, and we can use this signal as feedback to servo a reference voltage to the  $1\sigma$  level of the waveform.

Figure 8 shows a block diagram of the proposed adaptive spike detection algorithm. Comparator A is used in a feedback loop (with a gain of  $K$ ) that servos the duty cycle of its output to 0.159, thus setting  $V_{1\sigma}$  to the rms level of the input waveform. This voltage is then amplified by a constant  $N$  typically having a value of five or greater. The resulting voltage  $V_{N\sigma}$  is used as the threshold level for Comparator B. Thus, the circuit performs spike detection using a specified multiple of the background noise rms value.

The presence of spikes in the waveform will lead to errors in our estimate of the noise rms level since the  $V_{1\sigma}$  feedback loop does not distinguish between spikes and

**Fig. 8** Block diagram of the adaptive spike detection algorithm



background noise. However, if the spikes are approximately ac balanced (as most biphasic spike waveforms are) and occur relatively infrequently, they should have little effect on the rms noise estimate.

### B. Circuit Design and Implementation

We implemented the adaptive spike detection algorithm in a CMOS integrated circuit with the goal of minimizing power consumption and chip area. The circuit was completely integrated in a  $1.5 - \mu\text{m}$  2-metal, 2-poly CMOS process, using no off-chip components.

A schematic of the adaptive spike detection circuit is shown in Fig. 9. Comparators A and B are implemented using standard regenerative latch-and-hold topologies [37]. The duty cycle of comparator A is calculated using an OTA to realize a  $G_m - C$  low-pass filter. By biasing this OTA in the subthreshold region, cutoff frequencies below 1 Hz may be achieved [34]. The high-frequency oscillations of the digital waveform are attenuated leaving only the dc level, which is proportional to the duty cycle of the waveform. By taking a “running average” of the duty cycle using this leaky integrator, the circuit is able to adapt to time-varying levels of background noise. The time constant of this filter sets the adaptation time constant.

An nMOS differential pair is used to compare the output of the low-pass filter to the reference voltage  $V_{\text{duty}} = 0.159V_{\text{DD}}$ , which corresponds to a low-pass filter output indicating Comparator A is operating at the  $1\sigma$  threshold level. Current from one leg of the differential pair is mirrored using a pMOS current mirror and driven into two resistors in series. These resistors convert the current into two voltages:  $V_{1\sigma} = IR$  and  $V_{5\sigma} = 5IR$ . To save chip area, these resistors were implemented as nMOS transistors operating in the deep triode (linear) region.



**Fig. 9** Schematic of the adaptive spike detection circuit

### C. Circuit Testing

We tested the adaptive spike detector using a synthetic waveform programmed into an arbitrary waveform generator (Agilent 33120A). The test waveform consisted of three typical extracellular action potentials embedded in a background of Gaussian noise, and represented the output from a preamplifier in a neural recording system. The first 10 ms of the test waveform is shown as the input waveform in Fig. 10. The rest of the waveform consisted only of noise. The waveform was 80 ms in length and was played in a loop so the burst of three spikes appeared periodically at a rate of 12.5 Hz.

We applied this waveform to the input of the adaptive spike detector. The amplitude of the waveform was set so that the largest spike had an amplitude of 70 mV, and the background noise had an rms value of 5.5 mV. (Assuming a preamplifier with a gain of 60 dB, this corresponds to a spike amplitude of 70  $\mu$ V and a noise rms value of 5.5  $\mu$ V at the electrode.) Fig. 10 shows the input waveform along with the value of  $V_{5\sigma}$  and the output of Comparator B. The adaptive spike detector successfully sets the threshold to an appropriate level to detect spikes but reject noise.

The amplitude of the input waveform (largest spike) was varied from 23 mV to 116 mV (and the rms noise level varied from 1.8 mV to 9.2 mV). The circuit functioned correctly as the amplitude of the background noise changed by a factor of five. Figure 11 shows the response of the circuit to a waveform containing only noise and no spikes. The algorithm succeeded in rejecting the noise completely despite occasional peaks in the Gaussian waveform. (In Figs. 10, 11, the 0–5 V digital output voltage is scaled down for clarity.) The circuit consumed 57  $\mu$ W, and the two comparators consumed 91% of this power, so future work will focus on reducing their power dissipation.



**Fig. 10** Measured output of adaptive spike detection chip for input amplitude of 70 mV

**Fig. 11** Output of adaptive spike detection chip with input of bandlimited Gaussian noise only



## 4 Neural Field Potential Amplification

The previous section discussed the design of efficient spike-based measurement systems. A complimentary area of neural signal processing is being explored with local field potentials (LFPs). LFPs represent the ensemble activity in the vicinity of the electrode, which can include many thousands of cells. These signals can be useful for measuring more gross activity in a neural circuit, such as generalized motor activity for simple prosthetics, seizures and awareness. This section will provide an overview of state-of-the-art techniques for field potential measurement, which also translate to surface EEG recording.

### A. Design Requirements

Low frequency power fluctuations of LFPs within discrete frequency bands provide a useful biomarker for discriminating normal physiological brain activity from pathological states. Because LFPs represent the ensemble activity of thousands to millions of cells in an *in vivo* neural population, their recording generally avoids chronic issues like tissue encapsulation and micromotion encountered in single-unit recording [48,49]. LFP biomarkers are ubiquitous and span a broad frequency spectrum, from  $\sim 1$  Hz oscillations in deep sleep to  $>500$ Hz “fast ripples” in the hippocampus, and show a wide bandwidth variation. As an example, Fig. 12 illustrates high gamma band power fluctuations in the motor cortex signaling motion intent. The ability of a primate to signal an intention to move, and further modulate this band for refinement of control, has motivated it’s use as the input for a prosthetic actuator [48, 49]. This example also demonstrates a trend in neuronal sensing systems towards using higher frequency bandpower tracking from signals that were previously filtered out of surface EEG recording [49]; this trend exacerbates the use of digital processing to track key biomarkers due to power penalty of Nyquist sampling and high-rate digital processing

As the LFP biomarkers increase in frequency, the nature of their encoding motivates a new circuit architecture that directly extracts energy at key neuronal bands



**Fig. 12** An example of spectral band fluctuations in the motor cortex *preceding* motion [48]. This biomarker can be used to control a prosthesis using bandpower fluctuations under conscious control. Reprinted with permission

and tracks the relatively slow power fluctuations – much as an AM radio produces audio signals from the high-frequency carrier signal. By partitioning the neural interface for analog extraction of the relevant power fluctuations prior to digitization, the back-end requirements for sampling, algorithms, memory, & telemetry are relieved [50].

Similar to a spike-based system, the small amplitude of neural signals recorded extracellularly requires amplification before these signals can be digitized or analyzed in any way. An integrated front-end amplifier for neural signals must:

- (7) have sufficiently low input-referred noise to resolve local field potential (LFP) fluctuations as small as  $1 \mu\text{V}\text{-rms}$  in amplitude;
- (8) have sufficient dynamic range to convey LFPs as large as  $\pm 1\text{--}2 \text{ mV}$  in amplitude;
- (9) have much higher input impedance than the electrode-tissue interface and have negligible dc input current (note that the impedance off large platinum-itidium electrodes is an order of magnitude smaller than a MEMS array).
- (10) amplify signals in the frequency bands of interest (roughly 1–500 Hz for local field potentials);
- (11) reject low frequency drift ( $1/f$ ) and popcorn noise which might compromise the signal;
- (12) block dc offsets present at the electrode-tissue interface to prevent saturation of the amplifier; and
- (13) use few or no off-chip components to minimize size. Note that for this application amplifier size is not as critical since we are measuring ensemble activity and do not require a large array of amplifiers.

### B. Additional System Considerations

To achieve a practical chronic LFP measurement system, we want to extract the key physiological information prior to digitization. This design approach is similar in spirit to the 1-bit digitizer in a spike-based prosthesis, but we now focus on bandpower measurements. As highlighted in Fig. 13, we partitioned our sensing and algorithmic research prototype into three key blocks: a sense interface amplifier that connects to the electrodes for conditioning and amplifying field potentials, a microprocessor or equivalent processing unit for performing algorithms on the signal based on feature extraction, and a memory unit for recording events or general data-logging. The partitioning of the signal chain between analog and digital blocks is not arbitrary – we focused on designing a robust analog front-end to extract the core information of interest and thereby maximize information content prior to digitization. This allows us to run the digitizer and algorithms in the microprocessor



**Fig. 13** Prototype system architecture for the neurostimulator research tool

at low rates, utilizing less than one percent of the available processor resources and keeping system power below 25  $\mu\text{W}$ /sensing channel. To put this power budget into perspective, DBS therapies requires on the order of 500  $\mu\text{W}$  for tissue stimulation. Although the choice of this architecture provides the necessary flexibility in the algorithm by using a microprocessor, it does place demands on the design of the analog preprocessing block to maintain flexibility and acceptable accuracy over manufacturing corners.

### C. Micropower Chopper-Stabilized Amplification Strategy

To resolve LFPs in the presence of 1/f and popcorn noise, we used a chopper stabilized amplifier. Chopper stabilization is an established technique for suppressing offsets and drift, and has been explored extensively for biomedical applications [51, 52]. Figure 14 illustrates the core elements of a typical open-loop chopper amplifier. At the input, a CMOS switch modulator shifts the input signal,  $V_{\text{in}}$ , prior to entering the amplifier at node  $V_A$ . The choice of the modulation (chopping) frequency is set by the amplifier's excess noise, illustrated as "aggressors" superimposed at node  $V_A$ . The modulation frequency should be higher than the 1/f noise corner, as described in [57]. Post-amplification, a second demodulator at  $V_A'$  translates the signal back to baseband while shifting the aggressors up to the



**Fig. 14** Distortion and headroom problems encountered with an open-loop, low-power chopper amplifier architecture [57]

modulation frequency. The final lowpass filter of the signal at  $V_B$  then suppresses the up-modulated offsets and  $1/f$  noise from the amplifier at the output,  $V_{out}$ . Chopper stabilization suppresses the low-frequency noise with minimal signal or noise aliasing.

As summarized in [51], chopper stabilized amplifiers provide excellent micropower conditioning of a neural signal in the presence of excess noise from the process. Figure 15, for example, shows the elimination of popcorn noise from a signal by chopping above the Lorentzian corner for the process. In addition, the baseline NEF on the order of 4 represents state-of-the-art performance for general low-noise instrumentation amplifiers. *However, as highlighted in the Fig. 13, the amplification of the signal is only one facet of a complete system design.*

The extraction of the biomarkers requires additional signal processing, and this must be done with a power budget on the order of  $10\text{ }\mu\text{W}$ . If done with a microprocessor, this requires digitization of the signal at relatively high Nyquist sampling rates ( $\sim 500\text{Hz}$ ) and relatively fast digital signal processing. Using off-the-shelf processors, this processing burden requires up to  $1000 \times$  the power of the amplification process – hardly a good use of system power resources. In the spirit of the threshold detector for spikes, we will next describe a modest adaptation of the chopper amplifier that extracts key biomarker of interest. With this simple but robust analog preprocessing step, the overall system power drops by more than two orders of magnitude to help enable a practical LFP measurement architecture.



**Fig. 15** Suppression of popcorn using chopper stabilization. The chopper reduced the noise floor two orders of magnitude, to the thermal noise limit

## 5 Efficient Signal Processing: Bandpower Extraction of Field Potential Biomarkers

### A. Analog Preprocessing with the Heterodyning Chopper

The goal of the chopper-based analog preprocessing block is to extract bandpowers at key physiological frequencies with an architecture that is flexible, robust and low-noise; parts of this work were previously discussed in [56]. Chopper stabilized amplifiers were adapted for this purpose to provide wide dynamic range, high-Q filters. The key design change from [51] is to displace the clocks within the chopper amplifier to translate the frequency of the signal. As illustrated in Fig. 16, the up-modulator is set to one frequency,  $F_{\text{clk}}$ . At node  $V_A$ , the signal is then centered about the  $F_{\text{clk}}$  modulation frequency, well above the excess aggressor noise ( $1/f$ , popcorn). Demodulation is performed with a second clock of frequency  $F_{\text{clk}2} = F_{\text{clk}} + \delta$ . The net deconvolution of the signal and the demodulation clock re-centers the signal to dc and  $2\delta$  at the node  $V_B$ . Since the biomarkers are encoded as low frequency fluctuations of the spectral power, we can filter out the  $2\delta$  component with an on-chip two-pole lowpass filter with a bandwidth defined as  $BW/2$ ; signals on either side of  $\delta$  are aliased into the net pass-band at  $V_{\text{OUT}}$ . Unlike standard switched choppers, the heterodyning chopper suppresses harmonics as the square of the harmonic order, to yield a net output of

$$V_{\text{out}}(f) = \frac{4}{\pi^2} \cdot \sum_{n, \text{odd}} \frac{1}{n^2} \cdot V_{\text{in}}(f + \delta \cdot n) \cdot \cos(\phi), \quad (8)$$



**Fig. 16** Concept of merging heterodyning and chopper stabilization for flexible bandpass selection

where  $n$  denotes the harmonic order, and  $\phi$  is the phase between the  $\delta$  clock and the field potential input. To first order, therefore, the heterodyned chopper extracts a band equivalent to a fourth-order bandpass filter with a scale factor of  $4/\pi^2$ . The robustness of the design arises from the same features that make heterodyning attractive for AM radio applications – the center frequency is set by a programmable clock difference, which is simple to synthesize on-chip, while the bandwidth (and  $Q$ ) is set independently by a programmable lowpass filter with a quasi-Gaussian profile to minimize the frequency-time resolution constraint from information theory.

Since the “brain-under-test” and the IC clocks are uncorrelated, the phase of the signal,  $\phi$ , must also be accounted for in the circuit design. To address the phasing issue, two parallel heterodyning amplifiers are used driven with “in-phase” ( $I$ ) and “quadrature” ( $Q$ ) clocks created with on-chip distribution circuits; the net signal flow graph is illustrated in Fig. 17. The net power extraction,

$$V_{EEG.Power}(f) = \left[ \frac{4}{\pi^2} \cdot \sum_{n,odd} \frac{1}{n^2} \cdot V_{in}(f + \delta \cdot n) \right]^2, \quad (9)$$

is achieved with the superposition of the squared in-phase and quadrature signals. To achieve the lowest power possible, the superposition is achieved using on-chip self-cascoded Gilbert mixers to calculate the sum of squares and superimposing currents [54]. To prevent residual offsets in the tanh circuits from creating intermodulation



**Fig. 17** The complete on-chip signal flow diagram for the LFP signal chain

products in the I and Q channels, the inputs to the Gilbert multipliers are chopped with a 64 Hz square wave. The power output signal is lowpass filtered to the order of 1 Hz to track the essential dynamics of the biomarker, greatly easing resource requirements in the digital processing blocks.

In addition to bandpower extraction, the heterodyning chopper amplifier has several uses when the clock difference,  $\delta$ , is set to zero. The first application is to measure a standard time-domain neural signal without preprocessing, which can be useful for prescreening the waveforms to identify the spectral biomarkers of interest and to confirm algorithm functionality. The second application is to measure impedance with the addition of 10 uA current stimulation injected across electrodes at the chopper clock frequency, and fixing the state of the front-end modulators. Tapping the signal output of the in-phase channel then provides the real component of the impedance, while the output of the quadrature port is the complex impedance. This measurement can be useful for characterizing electrodes and tissue properties.

### B. Circuit Testing

A heterodyning chopper amplifier channel was prototyped in a 0.8 um CMOS process with high-resistance CrSi to verify the theory of operation.

The total IC current draw of 7 uW from a 1.8 V supply; 5  $\mu$ W was allocated for the heterodyning chopper chain, and 2  $\mu$ W for the support circuitry. Figure 18 illustrates the broad power tuning capabilities of the chopper for biomarkers between 10 Hz to 500 Hz (trim steps are 5 Hz). This range of programmability covers both known biomarkers detectable in surface EEG, as well as significantly higher frequency biomarkers like those in [48]. Trim states are written from the microprocessor via an I2C port, and can be either adjusted as part of an algorithm (e.g. a swept-sine spectrogram) or a state can be locked in with a non-volatile memory array on-chip.



**Fig. 18** Demonstration of band selectivity with the brain radio across a broad selection of physiologically relevant frequencies

The differential clock performance is crucial to proper operation of the signal chain. The maximum differential clock jitter was bounded ( $3\sigma$ ) to  $+/- 1$  Hz using 150 nA channel bias current, and the mean clock drift was approximately 0.1 Hz/C. The tight differential clock tolerance insures robust programmability using on-chip oscillators.

The signal chain's noise floor was measured to be approximately  $(1 \mu V_{rms})^2$  with channel conditions programmed to  $BW = 10$  Hz, and  $BW_{power} = 1$  Hz, in excellent agreement to theoretical expectations and suitable for detecting relevant biomarkers for neuroprostheses. To demonstrate the application of this IC in its intended application, Fig. 19 shows the response to a 50  $\mu$ V peak-to-peak test tone similar to the motor intention signal in Fig. 12. The IC output power reflects the low-frequency biomarker of interest, which can be now be sampled by the microprocessor at low frequency to save significant system power. Similar to the spike-based system, the extraction of the information prior to digitization also minimizes power for algorithmic computation and telemetry.



**Fig. 19** Demonstration go the signal chain–power output (dark trace) to a 152 Hz, 50  $\mu$ V peak-to-peak test tone

## 6 Summary of Neural Amplifier Design Strategies

This chapter discussed prototypes for practical prosthetic interface systems using the two major areas of focus in state-of-the-art neural recording: single-cell spike-based sensing and cellular ensemble field potential sensing. The choice of sensing paradigm is strongly dependent on the intended application. When the highest fidelity mapping of cortical function is required, the system designer will most likely

tend towards the resolution offered by ‘spike’ based systems; neuroprostheses requiring fine motor control is an example of such a system. Measurement of gross circuit activity to detect information such as general awareness, intention to move, and seizures does not require spike based signal resolution, and processing architectures that shift to more classical spectral analysis techniques provide definite power savings and advantages for chronic sensing. Each approach has its trade-offs, and the appropriate choice is made by the sensing requirements of the intended use.

Although the detailed constraints of each application are quite different, the design techniques employed are similar. Both applications focus on efficiently extracting neuronal biomarkers using analog preprocessing prior to digitization and algorithms and/or telemetry. Using the strategy of analog preprocessing prior to digitization, overall system power is greatly reduced with minimal trade-offs in algorithm performance. To select the appropriate method of analog pre-processing, the circuits are tailored for the specific features of the signal that define the relevant ‘biomarker.’ As the field of neural engineering continues to develop, IC designers will benefit by carefully studying the nature of bioelectrical signals of interest and architecting circuits as part of the overall biophysical neural network.

## References

1. J. Pine, “Recording action potentials from cultured neurons with extracellular microcircuit electrodes,” *J. Neurosci. Methods*, vol. 2, pp. 19–31, 1980.
2. A.C. Hoogerwerf and K.D. Wise, “A three-dimensional microelectrode array for chronic neural recording,” *IEEE Trans. Biomed. Eng.*, vol. 41, pp. 1136–1146, Dec. 1994.
3. A.L. Owens, T.J. Denison, H. Versnel, M. Rebbert, M. Peckerar, and S.A. Shamma, “Multi-electrode array for measuring evoked potentials from the surface of ferret primary auditory cortex,” *J. Neurosci. Methods*, vol. 58, pp. 209–220, 1995.
4. C.T. Nordhausen, E.M. Maynard, and R.A. Normann, “Single unit recording capabilities of a 100 microelectrode array,” *Brain Research*, vol. 726, pp. 129–140, 1996.
5. K. Najafi and K.D. Wise, “An implantable multielectrode array with on-chip signal processing,” *IEEE JSSC*, vol. 21, pp. 1035–1044, Dec. 1986.
6. K.D. Wise, D.J. Anderson, J.F. Hetke, D.R. Kipke, and K. Najafi, “Wireless implantable microsystems: high-density electronic interfaces to the nervous system,” *Proc. IEEE*, vol. 92, pp. 76–97, Jan. 2004.
7. R.H. Olsson III and K.D. Wise, “A three-dimensional neural recording microsystem with implantable data compression circuitry,” *IEEE J. Solid-State Cir.*, vol. 40, pp. 2796–2804, Dec. 2005.
8. R.R. Harrison, P.T. Watkins, R.J. Kier, R.O. Lovejoy, D.J. Black, B. Greger, and F. Solzbacher, “A low-power integrated circuit for a wireless 100-electrode neural recording system,” *IEEE J. Solid-State Cir.*, vol. 42, pp. 123–133, Jan. 2007.
9. T.M. Seese, H. Harasaki, G.M. Saidel, and C.R. Davies, “Characterization of tissue morphology, angiogenesis, and temperature in the adaptive response of muscle tissue to chronic heating,” *Lab. Investigation*, vol. 78(12), pp. 1553–1562, 1998.
10. J.C. LaManna, K.A. McCracken, M. Patil, and O.J. Prohaska, “Stimulus-activated changes in brain tissue temperature in the anesthetized rat,” *Metabolic Brain Disease*, vol. 4, pp. 225–237, 1989.
11. A. Jackson, J. Mavoori, and E.E. Fetz, “Long-term motor cortex plasticity induced by an electronic neural implant,” *Nature*, vol. 444, pp. 56–60, 2006.

12. J.K. Chapin, K.A. Moxon, R.S. Markowitz, and M.A.L. Nicolelis, "Real-time control of a robot arm using simultaneously recorded neurons in the motor cortex," *Nature Neurosci.*, vol. 2, pp. 664–670, 1999.
13. J. Wessberg, C.R. Stambaugh, J.D. Kralik, P.D. Beck, M. Laubach, J.K. Chapin, J. Kim, S.J. Biggs, M.A. Srinivasan, and M.A.L. Nicolelis, "Real-time prediction of hand trajectory by ensembles of cortical neurons in primates," *Nature*, vol. 408, pp. 361–365, 2000.
14. M.D. Serruya, N.G. Hatsopoulos, L. Paninski, M.R. Fellows, and J.P. Donoghue, "Instant neural control of a movement signal," *Nature*, vol. 416, pp. 141–142, 2002.
15. D.M. Taylor, S.I.H. Tillery, and A.B. Schwartz, "Direct cortical control of 3-D neuroprosthetic devices," *Science*, vol. 296, pp. 1829–1832, 2002.
16. P.R. Kennedy, R.A.E. Bakay, M.M. Moore, K. Adams, and J. Goldthwaite, "Direct control of a computer from the human central nervous system," *IEEE Trans. Rehab. Eng.*, vol. 8, pp. 198–202, June 2000.
17. L.R. Hochberg, M.D. Serruya, G.M. Friehs, J.A. Mukand, M. Saleh, A.H. Caplan, A. Branner, D. Chen, R.D. Penn, and J.P. Donoghue, "Neuronal ensemble control of prosthetic devices by a human with tetraplegia," *Nature*, vol. 442, pp. 164–171, 2006.
18. E.R. Kandel, J.H. Schwartz, and T.M. Jessell, *Principles of Neural Science*, 4th ed. Boston, MA: McGraw-Hill, 2000.
19. R.C. Gesteland, B. Howland, J.Y. Lettvin, and W.H. Pitts, "Comments on microelectrodes," *Proc. IRE*, vol. 47, pp. 1856–1862, 1959.
20. C.D. Ferris, *Introduction to Bioinstrumentation*. Humana, 1978.
21. R.R. Harrison, "A versatile integrated circuit for the acquisition of biopotentials," submitted to *IEEE Custom Integrated Circuits Conf.*, pp. 115–122, Sept. 2007.
22. A.C. Metting van Rijn, A. Peper, and C.A. Grimbergen, "High-quality recording of bioelectric events," *Med. Biol. Eng. Comput.*, vol. 29, pp. 1035–1044, 1986.
23. V.N. Murthy and E.E. Fetz, "Coherent 25- to 35-Hz oscillations in the sensorimotor cortex of awake behaving monkeys," *Proc. Natl. Acad. Sci. USA*, vol. 89, pp. 5670–5674, 1992.
24. J.P. Donoghue, J.N. Sanes, N.G. Hatsopoulos, and G. Gaal, "Neural discharge and local field potential oscillations in primate motor cortex during voluntary movements," *J. Neurophysiol.*, vol. 79, pp. 159–173, 1998.
25. K.V. Shenoy, M.M. Churchland, G. Santhanam, B.M. Yu, and S.I. Ryu, "Influence of movement speed on plan activity in monkey pre-motor cortex and implications for high-performance neural prosthetic system design," In: *Proc. 2003 Intl. Conf. of the IEEE Eng. in Medicine and Biology Soc.*, pp. 1897–1900 Cancún, Mexico, 2003.
26. C. Mehring, J. Rickert, E. Vaadia, S. Cardoso de Oliveira, A. Aertsen, S. Rotter, "Inference of hand movements from local field potentials in monkey motor cortex," *Nature Neurosci.*, vol. 6, pp. 1253–1254, 2003.
27. H. Scherberger, M.R. Jarvis, and R.A. Andersen, "Cortical local field potential encodes movement intentions in the posterior parietal cortex," *Neuron*, vol. 46, pp. 347–354, 2005.
28. B. Pesaran, J.S. Pezaris, M. Sahani, P.P. Mitra, and R.A. Andersen, "Temporal structure in neuronal activity during working memory in macaque parietal cortex," *Nature Neurosci.*, vol. 5, pp. 805–811, 2002.
29. S. Kim, R.A. Normann, R. Harrison, and F. Solzbacher, "Preliminary study of thermal impacts of a microelectrode array implanted in the brain," In: *Proc. 2006 Intl. Conf. of the IEEE Eng. in Medicine and Biology Soc.*, pp. 2986–2989, New York, NY, 2006.
30. S. Kim, P. Tathireddy, R. Normann, and F. Solzbacher, "*In vitro* and *in vivo* study of temperature increases in the brain due to a neural implant," In: *Proc. 3rd Intl. IEEE EMBS Conf. on Neural Engineering*, Kohala Coast, HI, 2007.
31. M.S.J. Steyaert, W.M.C. Sansen, and C. Zhongyuan, "A micropower low-noise monolithic instrumentation amplifier for medical purposes," *IEEE J. Solid-State Cir.*, vol. 22, pp. 1163–1168, Dec. 1987.
32. R.R. Harrison and C. Charles, "A low-power low-noise CMOS amplifier for neural recording applications," *IEEE J. Solid-State Cir.*, vol. 38, pp. 958–965, June 2003.

33. E.A. Vittoz and J. Fellrath, "CMOS analog integrated circuits based on weak inversion operation," *IEEE J. Solid-State Circ.*, vol.12, pp.224–231, 1977.
34. C. Mead, *Analog VLSI and Neural Systems*, Reading, MA: Addison-Wesley, 1989.
35. C.C. Enz, F. Krummenacher, and E.A. Vittoz, "An analytical MOS transistor model valid in all regions of operation and dedicated to low-voltage and low-current applications," *Analog Integrat. Circuits Signal Process.*, vol. 8, pp. 83–114, 1995.
36. Y. Tsividis, *Operation and Modeling of the MOS Transistor*, 2nd ed. Boston, MA: McGraw-Hill, 1998.
37. D.A. Johns and K. Martin, *Analog Integrated Circuit Design*, New York, NY: John Wiley & Sons, 1997.
38. K.S. Guillory and R.A. Normann, "A 100-channel system for real time detection and storage of extracellular spike waveforms," *J. Neurosci. Methods*, vol. 91, pp. 21–29, 1999.
39. K.S. Guillory, A.K. Misener, and A. Pungor, "Hybrid RF/IR transcutaneous telemetry for power and high-bandwidth data," in *Proc. 2004 Intl. Conf. IEEE Engineering in Medicine and Biology Soc. (EMBC 2004)*, San Francisco, CA, pp. 4338–4340, 2004.
40. K.A. Boahen, "Point-to-point connectivity between neuromorphic chips using address-events," *IEEE Trans. Circuits and Systems II*, vol. 47, pp. 416–434, May 2000.
41. R.R. Harrison, "A low-power integrated circuit for adaptive detection of action potentials in noisy signals," In: *Proc. 2003 Intl. Conf. of the IEEE Eng. in Medicine and Biology Soc.*, pp. 3325–3328, Cancún, Mexico, 2003.
42. P.T. Watkins, G. Santhanam, K.V. Shenoy, and R.R. Harrison, "Validation of adaptive threshold spike detector for neural recording," In: *Proc. 2004 Intl. Conf. of the IEEE Eng. in Medicine and Biology Soc.*, pp. 4079–4082, San Francisco, CA, 2004.
43. R.R. Harrison, G. Santhanam, and K.V. Shenoy, "Local field potential measurement with low-power analog integrated circuit," In: *Proc. 2004 Intl. Conf. of the IEEE Eng. in Medicine and Biology Soc.*, pp. 4067–4070, San Francisco, CA, 2004.
44. T.K. Horiuchi, T. Swindell, D. Sander, and P. Abshire, "A low-power CMOS neural amplifier with amplitude measurements for spike sorting," In: *Proc. 2004 IEEE Intl. Symp. on Circuits and Systems*, vol. 4, pp. 29–32, Vancouver, BC, Canada, 2004.
45. Z.S. Zumsteg, C. Kemere, S. O'Driscoll, G. Santhanam, R.E. Ahmed, K.V. Shenoy, and T.H. Meng, "Power feasibility of implantable digital spike-sorting circuits for neural prosthetic systems," *IEEE Trans. Neural Systems and Rehabilitation*, vol. 13, pp. 272–279, Sept. 2005.
46. A.F. Atiya, "Recognition of multiunit neural signals," *IEEE Trans. Biomed. Eng.*, vol. 39, pp. 723–729, July 1992.
47. G. Santhanam, M.D. Linderman, V. Gilja, A. Afshar, S.I. Ryu, T.H. Meng, and K.V. Shenoy, "HermesB: a continuous neural recording system for freely behaving primates," *IEEE Trans. Biomed. Eng.*, vol. 54, Issue: 11, pp. 2037–2050, Nov. 2007.
48. D.A. Heldman et.al., "Local field potential spectral tuning in motor cortex during reaching," *IEEE Trans. Neural Systems and Rehabil. Eng.*, vol. 14, no 2, 2006.
49. A.B. Schwartz, et. al., "Brain-Controlled Interfaces: Movement Restoration with Neural Prosthetics," *Neuron*, vol. 52, pp.205–220, 2006.
50. R.R. Harrison, et. al., "A Low-Power Integrated Circuit for a Wireless 100-Electrode Neural Recording System" *JSSC*, Vol. 42, pp. 123–133, 2007.
51. Denison et al. "A 2.2 uW 94 nV/Hz, Chopper-Stabilized Instrumentation Amplifier for EEG Detection in Chronic Implants," *JSSC*, vol 42, No 12, pp. 2934–2945, 2007.
52. R.F. Yazicioglu, P. Merken, R. Puers, and C. Van Hoof, "A 60 uW 60 nV/rtHz Readout Front-End for Portable Biopotential Acquisition Systems," *IEEE JSSC*, vol. 42, no. 5, pp. 1100–1110, 2007.
53. C.D. Salthouse and R. Sarpeshkar, "A practical micropower programmable bandpass filter for use in bionic ears," *JSSC*, Vol. 38,pp. 63–7
54. R.R. Harrison, G. Santhanam, and K.V. Shenoy, "Local field potential measurement with low-power analog integrated circuit," In: *Proc. 2004 Intl. Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2004)* , San Francisco, CA, pp. 4067–4070, 2004.

55. R. Sarpeshkar, "Borrowing from biology makes for low-power computing," *IEEE Spectrum*, pp. 24–29, May 2006.
56. T.J. Denison et al., "An 8 uW heterodyning chopper amplifier for direct extraction of 2 uVrms brain biomarkers," ISSCC 2008, paper 8.1.
57. K. Makinwha, Dynamic Offset Compensation Techniques, ISSCC 2007.
58. M. Sanduleanu, "A low noise, low residual offset, chopped amplifier for mixed level applications," *Proc. IEEE Int. Conf. Electron. Circuits and Systems*, 1998, vol. 2, pp. 333–336.

# Transimpedance Amplifiers for Extremely High Sensitivity Impedance Measurements on Nanodevices

Giorgio Ferrari, Fabio Gozzini and Marco Sampietro

**Abstract** The paper highlights the critical aspects in the design of high performance transimpedance amplifiers to be used for the electrical characterisation of nano-biodevices. Current sensitivity, bandwidth, dynamic range and leakage current discharge are discussed to cope the tight needs in impedance spectroscopy measurements at the nanoscale. An implementation in a standard 0.35  $\mu\text{m}$  CMOS technology using dual power supply of  $\pm 1.5$  V is described in detail: thanks to an active resistor of equivalent value up to 300 G $\Omega$  and minimum noise, a transimpedance amplifier operating from few Hz is obtained, featuring an operative dynamic range for ac current signals independent of the amount of the leakage current and allowing an unlimited measuring time, ideal for attoFarad capacitance measurements of biological samples in their physiological medium.

## 1 Introduction

The electrical characterization of single molecules and nanometer-scaled devices (referred as Device Under Test – DUT) requires the capability of detecting extremely low signals as a consequence of their very small dimension and of their very poor conductance. In most cases, and irrespectively of the specific measurement to be performed (quasi static current-voltage curves, impedance spectroscopy, noise analysis), the electrical quantity to be sensed is a current, whose value may be well below the pA in nowadays nano-bio research [1–4].

Transimpedance amplifiers, in which the signal current made available by the DUT is converted into a voltage with maximum signal-to-noise ratio ready for further processing, are perfectly suited to this task: thanks to the input virtual ground made available by the feedback architecture (see Fig. 1) the current flowing in the DUT can be measured with high accuracy irrespective of the overwhelming stray capacitances introduced by the connections. By applying a sinusoidal input

---

G. Ferrari (✉)

Politecnico di Milano, Dipartimento di Elettronica e Informazione P.zza L.da Vinci 32,  
20133 Milano, Italy



**Fig. 1** Measurement setup for the electrical characterization of a device (DUT) exploiting the peculiarities of transimpedance amplifiers.  $C_{p1}$  and  $C_{p2}$  are the stray capacitances given by the connections not influencing the accuracy of the measurement

voltage and by sensing the in-phase and in-quadrature current components, the device impedance upon the frequency can be extracted. Indeed, impedance spectroscopy is an essential tool to study the frequency response of a variety of systems and to obtain characteristic parameters of devices such as dielectric constant, charge carrier density, junction capacitance, electron mobility [5–8], as well as in electrochemistry [9, 10] to study interface adsorption and charge transfer reaction.

In the scheme of Fig. 1 the impedance resolution (maximum detectable ac resistance and minimum detectable capacitance) and the frequency range of the measurement are essentially determined by the transimpedance amplifier. Once fixed the noise floor given by the current-to voltage amplifier, higher resolution can only be achieved by continuously acquire the signal to filter out the noise fluctuations, thus paying the penalty of slowing down the measurement system. Therefore efforts must be put in the design of very low noise and adequate bandwidth transimpedance amplifiers to cope with the request of very high sensitivity.

Bandwidth is required not only to extend the frequency range of impedance measurements and to increase sensitivity. Fast circuits are also desired to track the time evolution of the nanoscopic system, for example in discriminating ionic current variations in detecting the chemical identity of single molecules passing through a ion channel [11, 12] or in conjunction with scanning probe microscopes (SPM), to operate in fast scanning mode so to investigate dynamical processes and to reduce the effects of the mechanical drift and of environmental noise [13]. In addition, the ability to discharge steady currents of non-negligible values is also an important feature. This is the case when measuring for example the capacitance of a living biomolecule with a precision better than the attofarad, having the molecule in its physiological medium, and consequently with large steady leakage currents [14, 15].

## 2 Transimpedance Amplifier Architectures

The classic configuration of a transimpedance amplifier (see Fig. 2a) having a simple resistor in the feedback path of an Operational Amplifier (OpAmp) cannot be adopted in single-chip realisations because of the difficulty to integrate a stable linear resistor of sufficiently high value ( $G\Omega$  values are currently used in discrete



**Fig. 2** Transimpedance amplifier configurations: (a) classical, (b) pole-zero compensated and (c) integrator-differentiator

realisations) and of the corresponding limited bandwidth due to the unavoidable stray capacitances in parallel to the resistor itself. Note that the feedback resistor would define the sensitivity of the transimpedance amplifier, as it sets the current noise at the input node ( $4kT/R_f$ ) and the precision of the instrument as it sets the current-to-voltage conversion factor; therefore, to improve the current resolution, the feedback resistor must be chosen as large as possible.

The lack of high-value, stable and linear resistor-like structures in most of the available integrated technologies has solicited the design of alternative topologies of transimpedance amplifier. At first, to extend its bandwidth the circuit of Fig. 2a can be followed by an amplifier with a gain that increases at frequencies greater than  $1/(2\pi R_f C_i)$ , as in the example of Fig. 2b, obtaining a flat overall frequency response [16, 17].

In the scheme of Fig. 2b a precise zero-pole compensation is obtained in the case of  $R_f C_i = R_z C_d$ . The disadvantage of this solution is the accurate calibration of the pole-zero compensation necessary to have a perfectly flat frequency response over the desired bandwidth. Usually in a totally integrated solution the two resistors are substituted by a matched pair of non-linear devices, typically transistors, conveniently operated (ohmic, saturation or subthreshold region) to obtain the desired conductance value [17, 18]. In this case the input offset of the amplifiers unbalances

the voltage across transistors and sets their transconductances to different values. This gives an unavoidable mismatch between resistances and capacitances that may prevent a flat response: at low frequencies the response is given by the ratio of the channel resistances and at high frequencies the response is given by the ratio of the capacitances. Furthermore the frequency of this gain discontinuity changes as the dc current from DUT changes because the channel resistances change their absolute values. These effects become important when the current is reduced, typically below the pA, irrespective of transistor operation, that is when the transistor voltages become comparable to the offset of the amplifiers. As a last disadvantage of this configuration, the leakage from the DUT changes the output voltage of the OpAmp and reduces the dynamic range of the circuit for the signal.

The substitution of the feedback resistor  $R_f$  with a switched-capacitor resistor cannot be used in our context of high-sensitivity wide bandwidth applications. Since a clock frequency greater than the signal bandwidth is required to avoid aliasing effects, the charge injection during the switching becomes a critical parameter. For example, a clock frequency of 10 MHz and a charge injection as low as 1fC give a spurious current of 10 nA! The successful application of switched capacitor concept is limited to the measurement of small capacitance variations of purely capacitive DUT [19, 20].

### 3 Integrator-Differentiator Scheme

The ideal architecture would be the integrator-differentiator scheme of Fig. 2c where the feedback resistor of the OpAmp is substituted by a well stable in value capacitor, thus obtaining a large bandwidth integrating stage and where a differentiating amplifier has been added to recover the desired linear relationship between input current and output voltage. Obviously a reset element must be added in parallel to the capacitance in order to prevent saturation of the integrator stage by the leakage currents to the input node from the DUT or from the OpAmp. This architecture offers the best trade-off in term of minimum noise (besides the OpAmp noise, along the signal path only the resistor  $R_d$  affects the input noise but reduced by the square of the amplifying factor  $C_d/C_i$ ), of accuracy of the current-to-voltage conversion factor (the gain being given by the ratio of two capacitances) and of large bandwidth practically approaching the gain bandwidth product (GBP) of the OpAmp.

Note that technology aspects play an important role in defining the amplifier characteristics. Taken a maximum practical value of  $R_d$  in the range of 100 k $\Omega$  and a maximum practical value of  $C_d$  of 20 pF, to produce a noise of  $R_d$  at the input equivalent to a 4 G $\Omega$  resistor (therefore negligible with respect to the white noise of any practical reset network), a ratio  $C_d/C_i > 200$  is required, thus setting  $C_i$  in the 100 fF range. Given a DUT+strays capacitance of 1 pF and a GBP of the OpAmp of 100 MHz (see Par.7 for details), the overall operating frequency limit of the transconductance amplifier is around 10 MHz, well above the classical scheme of Fig. 2a and adequate for a large set of applications.

The low frequency limit of operation of the transimpedance amplifier in the integrator-differentiator configuration instead depends on the details of the reset system of the integrator. A simple switch in parallel to  $C_i$  operated when the integrator output voltage reaches a defined threshold, would ensure a good extension in the low frequency region but would set a limit to the time interval available to measure the DUT current. This time depends both on the DUT leakage current and on  $C_i$ : if  $C_i = 100 \text{ fF}$  and  $I_{\text{leak}} = 10 \text{ nA}$ , a discharge period of only few tens of  $\mu\text{s}$  would result, largely inferior to the time required to measure accurately the impedance in many applications.

To obtain an unlimited measuring time, the switch should be replaced by a continuously-active system that resets the DC but leaves untouched the signal over a large bandwidth. In other words, the integrator stage should behave like a pure integrator starting from very low frequencies and consequently the singularities in the reset system should be placed at even lower frequencies, namely in the tens of Hz range. Because of this requirement, the successful solutions available in the literature to reset the feedback capacitance of charge preamplifiers [21–23] cannot be directly transferred to transimpedance amplifiers. Those architectures are not conceived to have poles at such low frequency and do not give the possibility to insert an equal number of low frequency zeros to provide the necessary feedback stability. In addition, charge preamplifiers used as transimpedance amplifiers have a mean value of the integrator output voltage defined by the dc leakage current, thus loosing the feature of having rail-to-rail dynamic range for ac current signals independent of the amount of the dc current.

In the following, we analyze a fully integrated solution to these problems consisting of a stable active reset network having poles and zeroes in the Hz range, providing a DC path to ground for the DUT leakage current, rail-to-rail dynamic for the signal and a signal extended bandwidth up to few MHz.

## 4 Active Discharge System

The conceptual scheme of the proposed active reset circuit, made of an amplifier  $H(s)$  in series to a resistive element  $R_{dc}$ , is shown in Fig. 3. The amplifier  $H(s)$  has a gain from node A to node B greater than 1 for the dc component and a strong attenuation in the signal bandwidth, as sketched in the figure.

The loop gain of the new feedback loop can be written (see details in [24]) as

$$G_{loop} = H(s) \frac{A}{1 + s(1 + A)C_i R_{dc}} \quad (1)$$

where  $A$  is the gain of the integrator OpAmp. At low frequencies the loop gain is strong enough to control the voltage across  $R_{dc}$  and to collect the leakage current,  $I_{dc}$ , in the resistor. At higher frequencies the feedback is not active and consequently not affecting the input signal,  $i_s$ , that is integrated in the capacitance  $C_i$ . Therefore



**Fig. 3** Concept of the feedback network to discharge the standing current from the DUT in the integrator stage (*left*), corresponding transfer function of the  $H(s)$  block (*upper right*) and overall loop gain to ensure stability (*lower right*)

the frequency  $f_m$  at which  $|G_{loop}(f_m)| = 1$  defines the lower limit for the signal bandwidth.

To ensure stability to the feedback network, since the integrator introduces a pole at very low frequency, the phase margin is fixed only by the amplifier  $H(s)$  and is given by  $\phi_m = 90^\circ - \angle H(f_m)$ . By using an amplifier  $H(s)$  with one pole at a frequency  $f_p$  and one zero at a frequency  $f_z > f_p$ , as sketched in Fig. 3, a phase margin greater than  $45^\circ$  is ensured providing that  $f_z < f_m$ . In this condition the minimum frequency amplified by the circuit is given by the following expression:

$$f_m = \frac{1}{2\pi \cdot R_{dc} C_i \cdot \gamma} \quad (2)$$

where  $\gamma$  is the attenuation of  $H(s)$  for frequencies greater than  $f_z$  and is a free parameter that can be tuned to obtain the desired value of the minimum frequency of signal,  $f_m$ , amplified by the transimpedance amplifier.

The challenges of implementing an integrated solution using a low-noise “resistance  $R_{dc}$ ” and an amplifier  $H(s)$  with pole and zero frequencies well below 100 Hz are discussed in the following sections.

## 5 Realisation of Very Large-Value Resistors

As the current noise of the resistance  $R_{dc}(4kT/R_{dc})$  would be injected directly into the input node, to sense femtoAmpere currents it is essential to use an equivalent resistor in the  $G\Omega$  range. Since technological limits prevent the integration of a physical resistor of such high value, an active very low-noise circuit should be



**Fig. 4** Schematics of the transimpedance circuit with the active network to draw the DUT steady current (dashed ellipse on the left) and with the amplifier  $H(s)$  (dotted ellipse on the right). The large value “resistor”  $R_a$  is, in turn, implemented by cascading 4 current reducer systems (see Fig. 5)

chosen. Note that although the linearity of element  $R_{dc}$  positively does not affect directly the signal path, a non-linear element such as a simple transistor would give a frequency  $f_m$  (and consequently a loop stability) dependent on the input dc current. Our solution uses a linear transconductor as shown in the dashed ellipse on the left in Fig. 4.

The core of this system are the matched MOSFET  $T_{att}$  and  $T_{spill}$  connected with source-well short circuited: with negative  $V_{GS}$ , the device operates as a pMos-diode; with positive  $V_{GS}$ , the parasitic drain-well (p-n) junction is forward biased, and the transistor acts as a diode [25]. The matched MOSFET’s have the same channel length and are biased with the same voltage, thus their current density is the same irrespective to the sign of the current  $I_{dc}$  flowing in  $T_{spill}$ . By designing  $T_{att}$  M-times larger than  $T_{spill}$ , the overall system acts as a linear and accurate current reducer by factor M. Figure 5 (left) shows the measured I-V characteristic certifying an equivalent resistance of more than  $45\text{ M}\Omega$  with very good linearity over the full voltage swing of  $\pm 1.5\text{ V}$  and occupying a very small area.

The implemented scheme is very beneficial from the sensitivity point of view: the current noise of the physical resistor  $R_{att}$  is in fact injected into the input of the circuit reduced by the factor  $M^2$ . In this way, choosing a reducing factor of 150 and a resistor  $R_{att}$  of  $300\text{ k}\Omega$ , the noise injected in the input is equivalent to a very large resistance of about  $6.5\text{ G}\Omega$ , although the I/V characteristic shows a resistance of “only”  $45\text{ M}\Omega$ . This low noise condition is preserved for currents less than the pA. For greater  $I_{dc}$  currents the shot noise ( $2qI_{dc}$ ) of the  $T_{spill}$  transistor operating in sub-threshold regime or as p-n diode becomes dominant. The flicker noise of  $T_{spill}$  has been made negligible in the signal band by using a non-minimal area MOSFET.



**Fig. 5** Measured I-V characteristics of the current reducers realized with the matched MOSFET scheme certifying an equivalent resistance of more than  $45\text{ M}\Omega$  used as reset element for the standing DUT current (*left*) and of  $300\text{ G}\Omega$  used to set the very low frequency pole in the  $H(s)$  network (*right*). In the insets the magnification around zero showing the capability of this system to drive very low current in the fA range

## 6 Feedback Network Design

Another critical aspect of the project is the realisation of the feedback network  $H(s)$  which should have (i) zeros and poles at frequencies well below 100 Hz given by the low limit of the signal bandwidth and (ii) a high DC gain,  $H(0)$ , to keep the output of the integrator close to zero irrespective of the dc input current, ensuring the maximum input range for the signal and a high linearity of the integrator in any bias condition. We adopted a first order filter (see Fig. 4) characterized, up to the GBP frequency of the amplifier  $OP_H$ , by the transfer function:

$$H(s) = \frac{A_0(1 + sC_2R_a)}{1 + sR_a[C_2 + C_1(1 + A_0)]} \quad (3)$$

where  $A_0$  is the DC gain of  $OP_H$ .

With this solution the crucial point is the design of the “resistor”  $R_a$ . Given a technical limit for  $C_1$  of  $10\text{ pF}$ , a value  $f_m$  of about 100 Hz can be obtained with a  $\gamma$  factor (see Fig. 3) equal to 400 and  $C_2 = 25\text{ fF}$ . To set the zero at least 1 decade before  $f_m$  the “resistor”  $R_a$  must be in the order of hundreds  $\text{G}\Omega$ . To accomplish this, a cascade of 4 current reducers similar to that discussed in the previous section has been implemented. Figure 5 (right) reports the measured I-V characteristic and shows that with this technique it is possible to make transconductors as low as  $1/300\text{ G}\Omega$  together with a good linearity. Note that the circuit can properly work at current levels as low as fA, with parasitic currents playing no role because all MOSFET terminals are actively controlled by the operational amplifier and the leakage current of Nwell-substrate is driven directly by the amplifier output. These properties are strictly necessary in this application in order to keep stable the loop in every bias conditions.

## 7 Forward Amplifier Design

The forward amplifier of the integrator stage plays a major role in the sensitivity of the instrument, as it sets the noise of the system, and in its operating capabilities, as it sets the high frequency bandwidth of the full transimpedance and the DC voltage of the input node. This latter point has solicited the use of a differential input configuration, powered at  $\pm 1.5$  V, to control the potential of the virtual ground of the integrator and to apply very small and accurate DC signal across the DUT without an additional voltage source. For this reason, a simple common source configuration, typical in many low noise applications, has not been used.

In order to achieve the desired very high sensitivity, the differential input stage uses a pair of p-MOS transistors (to minimize the flicker noise contribution in the signal bandwidth, despite a slightly higher white noise than in n-MOS) and is purely resistively loaded (an active load with a current mirror would have fully added its noise, whose white component would be higher than the one of a simple resistor and whose 1/f component would be very high considering that nMOSFET should be used). Parallel (current) noise of the input pair can of course be neglected. The voltage noise, instead, plays a role as it is amplified through the total input capacitance seen from the inverting OpAmp's input (sum of  $C_i$ , the feedback capacitance, of  $C_{DUT}$ , the equivalent capacitance of the DUT and of the relative interconnections, and of  $C_{gate}$  of the MOSFET) giving an equivalent input current noise of:

$$\overline{i_{eq}^2} = (2\pi f)^2(C_i + C_{DUT} + C_{gate})^2 \cdot \overline{v_n^2} \quad (4)$$

Because of the frequency dependence, this noise becomes dominant at the high frequency of the signal bandwidth and must be minimized. Special care should consequently be drawn in the design of the differential input transistors that set both  $C_{gate}$  and  $\overline{v_n^2}$ , whose expressions are:

$$\overline{v_n^2} = 2 \cdot \left[ \frac{2}{3} \frac{4kT}{g_m} \right] = 2 \cdot \left[ \frac{2}{3} \frac{4kT}{\mu_p} \frac{L^2}{C_{gate}(V_G - V_T)} \right] \quad (5)$$

$$C_{gate} = C'_{ox} WL \quad (6)$$

where the constant 2 reflects the presence of the two MOSFETs of the differential pair, k is the Boltzman constant,  $\mu_p$  is the hole carrier mobility in the channel and  $C_{ox}$ , W, L are respectively the gate capacitance per unit area, the channel width and length of the transistors. The suggestion of minimizing  $C_{gate}$  given by Eq. 4 is counter balanced by the suggestion of maximizing it in Eq. (5); optimum is obtained by differentiating Eq. (4) giving the following condition for the gate capacitance [26, 27]:

$$C_{gate} = C_i + C_{DUT}$$

In our case, with  $C_i$  already set at 100 fF and  $C_{DUT}$  estimated in the range of 0.5 pF (basically due to the on chip interconnection and to the bonding pad), we

obtained  $C_{\text{gate}} = 600 \text{ fF}$  leading to  $L = 0.6 \mu\text{m}$ , the minimum value that guarantee a good matching of the differential pair and a good drain resistance and  $W=220 \mu\text{m}$ . Note that the additional capacitance of the real DUT which will be connected to the bonding pad (molecules or nanometer-scale devices) do not alter this choice. In this framework, the amplifier can be used in many different applications without a need of a re-design.

The chosen configuration of differential input stage with simple resistive loads ensures also that the noise produced by the stage is not boosted higher by the fact that the two inputs of the differential pair are connected to significantly different impedances. This effect usually is present in commercial OpAmp made with complex input differential stage which is sensitive, from a noise point of view, to the difference on the input impedance [28].

The second stage of our OpAmp (see Fig. 6) is again a differential pair with active load that injects its current in a multipath nested Miller compensation stage [29, 30] to obtain the highest possible gain with strong loop stability. The overall Gain-Bandwidth product of the OpAmp ended to be of 100 MHz, thus providing about 10 MHz bandwidth to the transimpedance amplifier.

The output stage is not made with a common “push pull” topology because even if it has a good driving capability, the source output limits the dynamic range: the



**Fig. 6** Schematic of the OpAmp used in the integrator stage. Note the resistive load of the input stage to obtain minimum input noise

gate source potential in fact cannot drop below the threshold voltage so even if the gates of the output MOSFETs are actively driven rail-to-rail, the output voltage swing would be  $V_{\text{supply}} - V_{\text{thP}} - V_{\text{thN}} \simeq 1.5$ , too small for our purpose. The necessary capability to drive rail-to-rail the integrator output voltage is fully accomplish by a simple CMOS inverter [31].

Concerning the differentiator stage, its forward amplifier has an input stage similar to the one just described. A different compensation technique has been used because of the absence of the feedback capacitance now substituted by the resistance  $R_d$  that introduces a pole in the loop gain at the frequency  $1/2\pi R_d C_d = 80\text{ kHz}$ . A standard compensation, achieved by adding a feedback capacitance  $C_{fd}$  in parallel to the resistance  $R_d$ , is not straightforward because the singularity  $1/2\pi R_d C_{fd}$  should be at frequency greater than  $10\text{ MHz}$  not to limit the differentiator bandwidth. This imposes a gain-bandwidth product of the operational amplifier greater than  $10\text{ MHz} \cdot C_d/C_{fd} = 1.25\text{ GHz}$ . Instead, without affecting the closed-loop transfer function we introduced a zero directly in the forward path in the Miller compensation network, as well known from the literature [32–34].

## 8 Dynamic Range Considerations

The maximum amplitude of the current signal that the circuit can process depends on the frequency and is limited by the saturation to the supply voltage of three different nodes (see Fig. 4): the  $\text{DC}_{\text{out}}$ , the integrator output and the  $\text{AC}_{\text{out}}$ . At very low frequencies (DC range), where the amplifier  $H(s)$  has a high gain, both the integrator's output and the  $\text{AC}_{\text{out}}$  are not moved by the input signal; therefore, as reported in Fig. 7, the input current is limited by the saturation of the  $\text{DC}_{\text{out}}$  and given by  $1.5\text{ V}/R_{\text{DC}}$ , that is about  $25\text{ nA}$  for both negative or positive DUT currents. This condition holds as long as the gain of the amplifier  $H(s)$  is larger than 1. Over this limit (few tens of mHz) the integrator output node becomes the limiting node



**Fig. 7** Maximum input current that can be accepted by the transimpedance input node as a function of the frequency of the input signal

and the maximum input current decreases with frequency until the minimum value of  $1.5 \text{ V}/(\gamma R_{\text{DC}}) \cong 50 \text{ pA}$  after the zero of the amplifier  $H(s)$  (10 Hz) when the attenuation  $\gamma$  is maximum. This low value is kept until the feedback network is deactivated (100 Hz) and then increases with the frequency since the module of the integrator transfer function decreases. In this raising region the maximum input current is given by  $|I_{\text{max}}| = 1.5 \text{ V} \cdot (2\pi f C_i)$ . Finally, when the differentiator gain becomes larger than 1, the saturation of the transimpedance output voltage ( $A_C \text{out}$ ) becomes dominant and the maximum input current is given by  $1.5/60 \text{ M}\Omega = 25 \text{ nA}$ .

## 9 Experimental Performance of the Circuit

The overall frequency response of the circuit, implemented on standard  $0.35 \mu\text{m}$  CMOS technology [35], measured in the frequency range 10 Hz–10 MHz is reported in Fig. 8. The full bandwidth of the integrator expected by the theoretical analysis is fully confirmed by the experimental value: the minimum frequency is set by eq. 3 and the maximum frequency is limited only by integrator loop in agreement with its GBP of about 100 MHz and taking a  $C_{\text{stray}}$  of about 800 fF of the input device (giving a  $C_i + C_{\text{stray}} = 1 \text{ pF}$ ).

Note in Fig. 4 that the amplifier has actually two outputs: the signal output  $A_C \text{out}$  with bandwidth 100 Hz–5 MHz and a dc output  $D_C \text{out}$  that senses the voltage across  $R_{\text{att}}$  containing the frequency components lower than 100 Hz. Even if the feedback network is made by non-linear elements, the I/V characteristic of the  $D_C \text{out}$ , reported in Fig. 5 (left), shows a good linearity on a wide current range from 1 pA to 10 nA with only a slight asymmetry between positive and negative currents. This is due to the offset between the virtual grounds  $V_1$  and  $V_2$  and to the mismatch between the MOSFETs  $T_{\text{spill}} - T_{\text{att}}$  working as MOS-diode when the current is negative or as drain-Nwell diode when the current is positive. Despite this small asymmetry, the  $D_C \text{out}$  is well suited to monitor the bias condition of the DUT during the measurement and to track continuously the low frequency variation of the DUT if needed.

Figure 9a shows the equivalent input noise measured on the integrator-differentiator prototype, operating with the lowest bias current. The experimental result is in agreement with the theoretical prediction given by

$$\overline{i_n^2} \approx \frac{4kT}{R_{\text{att}} M^2} + \overline{i_{T_{\text{spil}}}^2} + \frac{4kT}{R_D (C_d/C_i)^2} + \overline{e_{\text{int}}^2} \cdot \omega^2 \cdot (C_i + C_{\text{input}})^2 \quad (7)$$



**Fig. 8** Experimental frequency response of the transimpedance amplifier



**Fig. 9** (a) Experimental equivalent input current noise when operating the amplifier with low bias current. The raise in the spectrum at higher frequencies is due to the total input capacitances. (b) Experimental white noise as a function of the input dc current

where  $\overline{i_{T_{spil}}^2}$  is the current noise of  $T_{spil}$ ,  $\overline{e_{int}^2} \cong (4nV)^2/Hz$  is the noise voltage source of the integrator OpAmp;  $C_{input} \cong 700\text{ fF}$  is the total capacitance at the input node of the integrator stage due to the operational amplifier, the DUT and the input stray capacitance. The thermal noise of the physical resistors  $R_{att}$  and  $R_d$  gives a contribution as low as  $3\text{ fA}/\sqrt{\text{Hz}}$  equivalent to the thermal noise of a  $2\text{ G}\Omega$  resistor. The current noise of MOSFET  $T_{spil}$  depends on the input dc current, it is negligible for currents smaller than  $30\text{ pA}$  and then increases by increasing the input bias current as shown in Fig. 9b. The increase follows a theoretical shot noise as expected for a transistor operating in sub-threshold regime or as a p-n diode. For large negative current the transistor  $T_{spil}$  operates in inversion regime and the noise is correspondingly less than the shot noise. The  $1/f$  noise at frequencies lower than  $1\text{ kHz}$  is added by the differentiator stage and therefore is independent from the input bias.

## 10 Conclusions

A transimpedance amplifier to be coupled to sub-pF capacitance DUTs has been designed and tested. The high sensitivity and the wide bandwidth achieved make it suitable for sensing very low and fast signals (pA on 100 kHz) or very small capacitances down to the attoFarad even when the biological sample under test can support a voltage signal of only few milliVolt.

## References

- P. J. de Pablo, F. Moreno-Herrero, J. Colchero, J. G. Herrero, P. Herrero, A. M. Baro, P. Ordejon, J. Soler, and E. Artacho, "Absence of dc-Conductivity in  $\lambda$ -DNA", Phys. Rev. Lett. Vol. 85, 2000, pp. 4992–4995.

2. L. Movileanu, S. Howorka, O. Braha, and H. Bayley, "Detecting protein analytes that modulate transmembrane movement of a polymer chain within a single protein pore", *Nat. Biotechnol.* Vol. 18, 2000, pp. 1091–1095.
3. A. Stamouli, J. Frenken, T. Oosterkamp, R. Cogdell, and T. Aartsma, "The electron conduction of photosynthetic protein complexes embedded in a membrane", *FEBS Lett.*, Vol. 560, 2004, 109–114.
4. S. M. Iqbal, G. Balasundaram, S. Ghosh, D. E. Bergstrom, and R. Bashir, "Direct current electrical characterization of ds-DNA in nanogap junctions", *Appl. Phys. Lett.* Vol. 86, 2005, p. 153901.
5. D. K. Schroder, "Semiconductor material and device characterization", Wiley-Interscience, 1998.
6. S. M. Sze, "Physics of semiconductor devices", Wiley-Interscience, 1981.
7. D. K. Schroder, J.-E. Park, S.-E. Tan, B. D. Choi, S. Kishino, and H. Yoshida, "Frequency domain lifetime characterization", *IEEE Transactions on Electron Devices*, Vol. 47, 2000, pp. 1653–1661.
8. H. G. L. Coster, T. C. Chilcott, and A. C. F. Coster, "Impedance spectroscopy of interfaces, membranes and ultrastructures", *Bioelectrochem. Bioenerg.*, Vol. 40, 1996, pp. 79–98.
9. E. Barsoukov and J. R. Macdonald (eds.), "Impedance spectroscopy: Theory, experiment, and applications", Wiley-Interscience, 2005.
10. E. Katz and I. Willner, "Probing biomolecular interactions at conductive and semiconductive surfaces by impedance spectroscopy", *Electroanalysis*, Vol. 15, 2003, pp. 913–947.
11. M. Akeson, D. Branton, J. J. Kasianowicz, E. Brandin, and D. W. Deamer, "Microsecond time-scale discrimination among polycytidyllic acid, polyadenylic acid and polyuridylic acid as homopolymers or as segments within single rna molecules", *Biophys. J.*, Vol. 77, 1999, pp. 3227–3233.
12. C. Yin Kong and M. Muthukumar, "Simulations of stochastic sensing of proteins", *J. Am. Chem. Soc.*, Vol. 127, 2005, pp. 18252–18261.
13. M. J. Rost, L. Crama, P. Schakel, E. van To, G. B. E. M. van Velzen-Williams, C. F. Overgaauw, H. ter Horst, H. Dekker, B. Okhuijsen, M. Seynen, A. Vijftigschild, P. Han, A. J. Katan, K. Schoots, R. Schumm, W. van Loo, T. H. Oosterkamp, and J. W. M. Frenken, "Scanning probe microscopes go video rate and beyond", *Rev. Sci. Instr.*, Vol. 76, 2005, p. 053710.
14. H. H. Chowdhury, M. Kreft, and R. Zorec, "Distinct effect of actin cytoskeleton disassembly on exo- and endocytic events in a membrane patch of rat melanotrophs", *J. Physiol. (Lond.)*, Vol. 545, 2002, pp. 879–886.
15. T. M. Suchyna, S. R. Besch, and F. Sachs, "Dynamic regulation of mechanosensitive channels: capacitance used to monitor patch tension in real time", *Phys Biol.*, Vol. 1, 2004, 1–18.
16. C. Ciofi, F. Crupi, C. Pace, and G. Scandurra, "How to enlarge the bandwidth without increasing the noise in op-amp-based transimpedance amplifier", *IEEE Trans. Instrum. Meas.*, Vol. 55, 2006, pp. 814–819.
17. P. O'Connor, G. Gramegna, P. Rehak, F. Corsi, and C. Marzocca, "CMOS preamplifier with high linearity and ultra low noise for x-ray spectroscopy", *IEEE Trans. Nucl. Sci.*, Vol. 44, 1997, 318–325.
18. C. Guazzoni, M. Sampietro, A. Fazzi, and P. Lechner, "Embedded front-end for charge amplifier configuration with sub-threshold MOSFET continuous reset", *IEEE Trans. Nucl. Sci.*, Vol. 44, 2000, pp. 1442–1446.
19. B. V. Amini and F. Ayazi, "A 2.5-v 14-bit Sigma-Delta CMOS-SOI capacitive accelerometer", *IEEE J. Solid-state circuits*, Vol. 39, 2004, pp. 2467–2476.
20. M. A. Lemkin, M. A. Ortiz, N. Wongkomet, B. E. Boser, and J. H. Smith, "A 3-axis surface micromachined  $\Sigma\Delta$  accelerometer", *Proc. IEEE International Solid-State Circuits Conference Digest of Technical Papers*, 44th ISSCC, 1997, pp. 202–203,457.
21. F. Laiwalla, K. G. Klemic, F. J. Sigworth, and E. Culurciello, "An integrated patch-clamp amplifier in silicon-on-sapphire CMOS", *IEEE Trans. Circuits and Systems – I*, Vol. 53, 2006, pp. 2364–2370.

22. G. De Geronimo, P. O'Connor, V. Radeka, and B. Yu, "Front-end electronics for imaging detectors", Nucl. Instr. Meth. A, Vol. 471 2001, 192–199.
23. M. Sampietro, G. Bertuccio, and L. Fasoli, "Current mirror reset for low-power BiCMOS charge amplifier", Nucl. Instr. Meth. A, Vol. 439, 2000, 373–377.
24. G. Ferrari and M. Sampietro, "Wide bandwidth transimpedance amplifier for extremely high sensitivity continuous measurements", Rev. Sci. Instr., Vol. 78, 2007, pp. 094703–7.
25. F. Gozzini, G. Ferrari, and M. Sampietro, "Linear transconductor with rail-to-rail input swing for very large time constant applications", Electron. Lett., ol. 42, 2006, pp. 1069–1070.
26. Z. Y. Chong and W. M. C. Sansen, "Low-noise wide-band amplifiers in bipolar and CMOS technologies", Springer, 1990.
27. L. Fasoli and M. Sampietro, "Criteria for setting the width of CCD front end transistor to reach minimum pixel noise", IEEE Trans. Electron Devices, Vol. 43, 1996, 1073–1076.
28. A. Cerizza, A. Fazzi, and V. Varoli, "Performances of operational amplifiers in front-end electronics for nuclear radiation detectors", IEEE Nuclear Science Symposium Conference, 2004, pp. 1399–1402.
29. R. G. H. Eschauzier, L. P. T. Kerklaan, and J. H. Huijising, "A 100-MHz 100-dB operational amplifier with multipath nested Miller compensation structure", IEEE J. Solid-state Circ., Vol. 27, 1992, pp. 1709–1717.
30. F. You, S. H. K. Embabi, and E. Sanchez-Sinencio, "Multistage amplifier topologies with nested  $G_m - C$  compensation", IEEE J. Solid-state Circ., Vol. 32, 1997, pp. 2000–2011.
31. J. N. Babanezhad, "A low-output-impedance fully differential op-amp with large output swing and continuous-time common-mode feedback", IEEE J. Solid-state Circ., Vol. 26, 1991, pp. 1825–1833.
32. K. N. Leung, P. K. T. Mok, W.-H. Ki, and J. K. O. Sin, "Three-stage large capacitive load amplifier with damping-factor-control frequency compensation", IEEE J. Solid-state Circ., Vol. 35, 2000, pp. 221–230.
33. P. K. Chan and Y. C. Chen, "Gain-enhanced feedforward path compensation technique for pole-zero cancellation at heavy capacitive loads", IEEE Trans. Circ. Syst. II, Vol. 50, 2003, pp. 933–941.
34. Q. Li, J. Yi, B. Zhang, and Z. Li, "A dual complex pole-zero cancellation frequency compensation with gain-enhanced stage for three-stage amplifier", Analog. Integr. Circ. Sig. Process., Vol. 48, 2006, pp. 175–180.
35. AMS 0.35  $\mu\text{m}$  CMOS C35B4, Austriamicrosystems AG, <http://www.austriamicrosystems.com>

# Design of High Power Class-D Audio Amplifiers

Marco Berkhouit

**Abstract** Although distortion is a key characteristic of class-D audio amplifiers, in practice most design effort is spent on robustness and the mitigation of audible artifacts. This paper explores some topics from these relatively unfamiliar areas of design. The switching nature of class-D amplifiers involves problems and solutions that are typical for class-D.

## 1 Introduction

High power class-D amplifiers have become standard in many consumer electronic applications such as television sets and home-theatre systems. The most important feature of class-D amplifiers is high efficiency that typically is higher than 90% at full output power. This high efficiency allows very high output power with modest heat sinking. Output powers in excess of 100 W per channel are no exception. Currently, class-D is also making a cautious entrance into the automotive domain. The first integrated class-D audio amplifiers designed to operate directly from the car battery are now entering the market. Although audio amplifiers are required to have low distortion, in practice distortion is not considered a product differentiator. For sure, the distortion has to satisfy some minimum requirement, e.g. 60 dB, but that is usually not too difficult to achieve. Ultimately, the end customer will probably not notice a few dB's improvement in distortion, but he will notice undesired noises that are produced by the amplifier and he will most certainly notice a malfunction. Therefore, considerable effort is made in making audio amplifiers quiet when they are supposed to be quiet and keeping them intact the rest of the time.

---

M. Berkhouit (✉)  
NXP Semiconductors, Nijmegen, The Netherlands  
e-mail: Marco.Berkhouit@nxp.com

## 2 Class-D Basics

A basic class-D amplifier is shown in Fig. 1. At the heart of a class-D amplifier are two low-ohmic switches that alternately connect the output node to the positive or negative supply rail. Usually, some form of Pulse Width Modulation (PWM) is used to encode the audio signal. The audio signal is subsequently retrieved by means of an LC lowpass filter connected between the class-D output stage and the load.

The probably simplest form of PWM is so-called natural sampling PWM or NPWM [1]. A NPWM signal can easily be constructed by comparing the audio signal to a triangular reference as shown in Fig. 2. The fundamental frequency of the triangular reference is usually much higher than the highest audio frequency, e.g. around 350 kHz, and is called the carrier frequency

Although the generation of NPWM involves a highly nonlinear comparator the frequency spectrum of a NPWM does not contain harmonics of the input signal but only intermodulation products of the carrier and the input signal, i.e., NPWM is free from harmonic distortion. Assuming that the triangular reference has a sufficiently high frequency, the intermodulation products do not fold back to the audio frequency band and are filtered out by the LC lowpass filter.

A class-D output stage can be either single-ended (SE) or differential, yielding a so-called bridge-tied-load (BTL) configuration as shown in Fig. 3. In a BTL configuration both sides of the loudspeaker load are driven in opposite phase. The BTL configuration has the advantages that it can work from a single supply while doubling the voltage swing across the load, giving four times more output power compared to a single-ended amplifier. Furthermore, the balanced operation cancels



**Fig. 1** Basic class-D amplifier



**Fig. 2** Natural sampling PWM



**Fig. 3** BTL configuration with (a) AD modulation (b) BD modulation

out even order distortion. On the downside, a BTL amplifier needs twice the number of power switches and inductors making it relatively expensive.

In a BTL class-D amplifier the phase of the carriers of both bridge halves can be chosen independently. When the carriers are in opposite phase, as shown in Fig. 3(a), this is called AD-modulation. The main advantage of AD-modulation is that the output signal has zero common mode since the bridge halves always switch simultaneously in opposite directions. Conversely, when the carriers are in-phase, as shown in Fig. 3(b), this is called BD-modulation. Compared to AD-modulation, BD-modulation is much less sensitive to clock jitter. [2]. In practice both modulation types are being used.

## 2.1 Amplifier Architectures

A distinction can be made into two main amplifier architectures. The first is the open-loop architecture, often referred to as *full-digital*. In open-loop class-D amplifiers the PWM signal is generated in the digital domain and then used to drive a class-D output stage. Open-loop class-D amplifiers can achieve very good performance but do require a very good power supply since they have a poor supply rejection [3]. Since full-digital class-D amplifiers are generally configured in BTL, the effect of supply variations is cancelled out differentially as long as no signal is applied. Open loop class-D amplifiers can be found in many consumer applications such as home-theatre systems but are unacceptable for car audio because the car battery tends to be rather noisy.

A straightforward method to improve supply rejection is to use feedback, which marks the second class-D amplifier architecture. Besides providing supply rejection the feedback loop also corrects for timing and amplitude errors in the class-D power stage. In feedback class-D amplifiers, a subdivision can be made in self-oscillating and fixed frequency amplifiers. Self-oscillating class-D amplifiers do not require an external clock reference but are essentially oscillators by themselves. Although self-oscillating amplifiers can have extremely good audio performance it is difficult to combine more than one audio channel [4]. This is because the switching frequency of self-oscillating class-D amplifiers is not constant but depends on the output signal. When two adjacent channels switch at slightly different frequencies, this can lead to audible intermodulation products or beat tones. This problem does not occur in fixed-frequency feedback class-D amplifiers that use an external clock reference that determines the PWM carrier frequency [5]. A major disadvantage of feedback class-D amplifiers is that they require an analog input signal whereas nowadays most audio sources are digital.

### 3 Robustness

Robustness is an absolute necessity for any integrated circuit but is especially challenging in audio power amplifiers since the outputs are usually readily available to the outside world. Consequently an audio power amplifier needs to be able to survive all sorts of abuse including short circuits across the load and from the output to the supply rails, dynamic and nonlinear behavior of loudspeakers and, in case of class-D amplifiers, resonating output filters and inductor saturation at high output currents. Usually an over-current protection is used to prevent the amplifier from getting damaged if the output current exceeds a safe limit value [6]. Up to that limit value the output stage needs to be able to switch reliably even if the application around the amplifier is not optimal. For example, it is not always possible to have an effective supply decoupling as close to the output stage as one would like. To reduce cost many electronics manufacturers prefer through-hole packages with long leads over surface-mounted packages and also cheap single layer printed circuit boards are commonly used.

#### 3.1 Power Transistor Design

Design of a class-D output stage begins with the design of a proper power transistor. High power class-D amplifier IC's are manufactured in dedicated BCD technologies [7, 8]. The DMOS transistors in these technologies are very well suited to serve as power switches. In Fig. 4 a class-D output stage is shown in more detail. Two very large DMOS power transistors  $M_L$  and  $M_H$  are used as switches. The backgate diodes of the DMOS transistors serve as fly-back diodes. The gate driver of the



**Fig. 4** Class-D output stage

lowside DMOS power transistor  $M_L$  uses an externally decoupled supply  $V_{REG}$ . The highside gate driver is supplied from an external bootstrap capacitor  $C_{BOOT}$ .

The bootstrap capacitor is recharged each time the output node  $V_{OUT}$  is switched to the negative supply  $V_{SSP}$  through a diode  $D_{BOOT}$  that is connected to  $V_{REG}$ . The switching of the output is controlled by a logic circuit that operates from a separate digital supply. The communication between the gate drivers and the switch control logic is handled by levelshifters.

The power transistors in the output stages usually consume at least half the die area. As a result the optimization of these power transistors is an important aspect in the design. The DMOS power transistors need to have a low on-resistance  $R_{DSon}$ , e.g.  $100\text{ m}\Omega$  to achieve high efficiency. On the other hand the power transistors need a sufficiently high breakdown voltage to survive the inductive voltage peaks that occur when switching large currents. In a typical BCD process,  $R_{DSon} \cdot \text{Area}$  product scales approximately linearly with breakdown voltage.

### 3.2 Transient Dissipation

At high output currents the peak dissipation in the power transistors can become extremely high during output transitions. As can be seen in the class-D output stage shown in Fig. 4, the gates of the power transistors  $M_H$  and  $M_L$  are driven

by inverters  $M_{PH/NH}$  and  $M_{PL/NL}$ . The dimensioning of these inverters together with the parasitic capacitances of the power transistors determines the dynamic behavior of the class-D output stage [9, 10]. The inductor in the demodulation filter forces the output current  $I_{OUT}$  to remain nearly constant during output transitions. The output current  $I_{OUT}$  can be flowing both towards and from the output stage during both rising and falling output transitions. This yields four possible transition scenarios. Figure 5 shows some typical waveforms that occur during a rising and a falling transition at the output while a high current, e.g. 10 A, is flowing towards the class-D output stage. Note that when the direction of the output current is reversed the transition behavior is similar but the roles of highside and lowside are interchanged.

The top graphs show the gate-source voltages  $V_{GSL}$  and  $V_{GSH}$  of the power transistors  $M_L$  and  $M_H$  respectively. The middle graphs show the drain-source voltage and the drain current of the lowside power transistor  $M_L$ . Finally in the bottom graphs the instantaneous power dissipation in the lowside power transistor  $M_L$  is shown.

In Fig. 5(a) a rising transition is shown. Three phases can be distinguished in the transition. First the gate of the lowside power transistor  $M_L$  is discharged rapidly while the gate of the highside transistor  $M_H$  is charged. As soon as  $V_{GSL}$  approaches the threshold voltage level  $V_{TH}$ , the output current  $I_{OUT}$  pulls the output node  $V_{OUT}$



Fig. 5 Output transition with high output current flowing in (a) rising edge (b) falling edge

up entering the second phase. The transition is so fast that the lowside power transistor  $M_L$  is not switched off completely but remains conducting during the transition. The speed of the transition is actually governed by the size of driver transistor  $M_{NL}$  and gate-drain capacitance  $C_{GDL}$ . As can be seen the gate-source voltage  $V_{GSL}$  stalls during the transition. The gate-source voltage  $V_{GSH}$  of the highside power transistor  $M_H$  initially starts to rise but is then pushed down through  $C_{GDH}$ . After the transition has finished the third phase starts where the gate of the highside power transistor  $M_H$  is charged to its final value. This scenario is known as *soft switching*.

During the transition phase a dissipation peak occurs since there is both current through and voltage across the lowside power transistor  $M_L$ . Although the peak dissipation is quite high, it pales in comparison to what happens during the falling transition shown in Fig. 5(b).

In the first phase the gate of the highside power transistor  $M_H$  is discharged causing the output current to flow through the backgate diode of  $M_H$ . At the same time the gate of the lowside power transistor  $M_L$  is charged. As  $V_{GSL}$  reaches the threshold level the lowside power transistor  $M_L$  starts to conduct but the output node  $V_{OUT}$  remains at the highside until the current through  $M_L$  matches the output current  $I_{OUT}$ . Meanwhile,  $V_{GSL}$  increases further. As the output node  $V_{OUT}$  is pulled down the voltage across the (conducting) back-gate diode is reversed. This results in a reversal of the diode current due to a well-known effect called *reverse recovery*. In the lowside power transistor  $M_L$  the reverse recovery current adds to the output current causing  $V_{GSL}$  to increase even further. The reverse recovery current tends to stop quite abruptly when the diode runs out of minority carriers. However, by then the gate-source voltage  $V_{GSL}$  of the lowside power transistor  $M_L$  has reached a value that corresponds to the output current plus the reverse recovery current. As a result the output node  $V_{OUT}$  is initially pulled down very fast. Then feedback through the gate-drain capacitance  $C_{GDL}$  causes  $V_{GSL}$  to be pushed down causing a characteristic ‘overshoot’ in  $V_{GSL}$ . Also the gate of the highside power transistor  $M_H$  is pulled up which can lead to additional peak current if the threshold voltage  $V_{TH}$  is exceeded. After this rapid start the transition is continued at a more moderate pace during the second phase. Finally after the transition has finished the charging of the gate of lowside power transistor  $M_L$  is finalized in the third phase. This scenario is known as *hard switching*. As can be seen in the bottom graph of Fig. 5(b) the dissipation  $P_L$  in the lowside power transistor  $M_L$  is huge. The peak dissipation is the product of the output current plus reverse recovery current and the supply voltage, e.g.  $13\text{ A} \times 60\text{ V} = 780\text{ W}$ ! The total dissipation depends on the duration of the transition.

The occurrence of reverse recovery and especially the rapid decrease of the reverse recovery current is probably the most important source of EMI in class-D amplifiers. The magnitude of the reverse recovery current depends on how fast the current in the lowside power transistor increases but can easily exceed the initial forward current through the diode. Here a tradeoff needs to be made. On one hand, slowing down the transition reduces the reverse recovery current and reduces EMI but on the other hand this increases the power dissipation.

### 3.3 Inductive Voltage Peaking

The situation becomes more complicated when parasitic inductances are taken into account as shown in Fig. 6. These inductances are caused by bonding wires, lead fingers and PCB tracks and can easily add up to tens of nanoHenrys.

Especially the inductances  $L_{SSP}$  and  $L_{DDP}$  in series with the power supply have a significant influence because the current flowing through these inductors is switched on and off during normal operation.

Consider, for example, what happens during a falling transition while a large current is flowing out of the class-D output stage. Initially the highside power transistor  $M_H$  is conducting and the current is drawn from the positive supply  $V_{DDP}$  through inductance  $L_{DDP}$ . As the falling transition begins the current through the inductance decreases rapidly causing a positive voltage excursion at the drain of the highside power transistor  $M_H$ . When the output voltage reaches the negative supply rail  $V_{SSP}$  the backgate diode of the lowside power transistor  $M_L$  starts conducting, causing a rapid increase in current towards the negative supply  $V_{SSP}$  through inductance  $L_{SSP}$ . This in turn causes a negative voltage excursion at the source of the lowside power transistor  $M_L$ . The combination of these excursions can cause the total voltage across the class-D output stage to exceed the breakdown voltage of the power transistors, damaging the circuit. Besides the power transistors, also the circuits in the lowside driver that are supplied from the  $V_{REG}$  node are at risk. This is because the external decoupling capacitor  $C_{REG}$  is in series with inductance  $L_{SSP}$ . Consequently, the negative voltage excursions at the source of the lowside power transistor  $M_L$  also appear directly across the lowside driver circuits.



**Fig. 6** Parasitic inductances

An effective way to prevent damage to the circuits is the use of voltage clamps. The voltage across the lowside and highside driver circuit can be clamped using snapback ESD devices such as grounded gate NMOS transistors. The energy involved with clamping of inductive voltage peaks is quite small compared to an ESD event. A 2000 V Human Body Model discharge releases 200  $\mu\text{J}$  whereas a 10 A current through a 20 nH inductor only yields 1  $\mu\text{J}$ . For clamping the output stage an active clamp as shown in Fig. 7 can be used. The ignition voltage of this circuit is determined by the number of Zener diodes in the stack and can be tuned to match the breakdown voltage of power transistors. In this way power transistors can be used with a minimal overhead in breakdown voltage.

The large voltage excursions during high current switching make it necessary to include parasitic inductances in the circuit simulations. Figure 8 shows a falling and rising output transition with and without the effect of parasitic inductance. For comparison a measured output transition under the same conditions is shown.

As can be seen parasitic inductances have a significant influence on the output transitions. The measurement confirms that voltage excursions in the range of tens of Volts are realistic. Under such circumstances it is very well possible that parts of



Fig. 7 Active clamp circuit



Fig. 8 Effect of parasitic inductance

the circuit do not behave as desired. Especially the gate drivers and levelshifters in the class-D output stage are likely candidates for malfunction. In practice, a coarse modeling of the parasitic inductances is sufficient to track down potential malfunction in circuits.

## 4 Audible Artifacts

Besides robustness, much effort in the design of high power class-D amplifiers is spent on audible artifacts or rather the prevention or minimization thereof. This section discusses some topics in this area. As a vehicle for the discussion, the second order class-D feedback loop shown in Fig. 9 is used.

The loop has two integrators configured around amplifiers  $g_{m1}$  and  $g_{m2}$ . The input signal is converted to a current  $I_{IN}$  by a linear VI converter  $g_{m0}$  and injected into the virtual ground of the first integrator  $g_{m1}$ . The load  $R_O$  is connected to the amplifier by means of a low-pass LC-filter. Analysis of the loopgain and stability of this feedback loop is presented in detail in [8] and is beyond the scope of this paper. However, as an introduction to the following paragraphs some characteristics of the signals inside the loop are shown in Fig. 10.

The PWM output voltage  $V_{OUT}$  is converted to a current  $\pm I_{PWM}$  through feedback resistor  $R_1$  and injected into the virtual ground of the first integrator  $g_{m1}$ . This yields a triangular wave  $V_P$  at the output of the first integrator. A reference clock signal  $osc$  is converted to a square wave current  $\pm I_{OSC}$  that is injected into the virtual ground of the second integrator  $g_{m2}$ . This yields a second triangular wave  $V_M$  at the output of the second integrator. This second triangular wave serves as reference triangle for NPWM generation as presented earlier. The oscillator current  $I_{OSC}$  is made proportional to the supply voltage so that the amplitude of both triangular waves is also proportional to the supply voltage.

The triangular signals  $V_P$  and  $V_M$  are fed to the non-inverting and inverting inputs of a comparator  $A_0$ . When the triangular waves intersect the comparator output signal  $pwm$  changes state and the output  $V_{OUT}$  of the amplifier switches yielding



**Fig. 9** Class-D feedback loop topology



**Fig. 10** Integrator signals  $V_P$  and  $V_M$ , oscillator **osc** and comparator output **pwm** during: (a) zero input signal (b) positive input signal (c) negative input signal

the desired PWM signal. Note that the peaks of signal  $V_M$  coincide with the edges of the **osc** signal and the peaks of signal  $V_P$  coincide with the edges of the **pwm** signal.

Figure 10(a) shows the triangular wave signals  $V_P$  and  $V_M$  when no input signal is applied yielding a PWM output signal with a 50% duty-cycle. Figure 10(b) shows the same signals when a positive input signal is applied. The input signal causes the slopes of  $V_P$  to change. The shape of  $V_M$  remains (almost) the same but the DC-level is shifted with respect to zero. The output signal now has a duty-cycle greater than 50%. The opposite happens for a negative input signal as shown in Fig. 10(c). In this manner a linear relation is realized between the input signal and the duty-cycle of the output signal  $V_{OUT}$ .

#### 4.1 Startup Noise

A common problem with audio amplifiers is the occurrence of noises in the loudspeaker when the amplifier is switched on. Considerable design effort is usually needed to reduce this startup noise or ‘plop’. In the class-D feedback amplifier several mechanisms can contribute to startup noise.

Usually, the most dominant mechanism is related to the equivalent input offset voltage of the amplifier. An example of a linear VI-converter circuit is shown in Fig. 11. Offset is caused by mismatches in the VI-converter circuit, primarily the conversion resistors, or components in the external application. Because this offset is amplified by the closed loop gain, e.g. 30 dB it can easily add up to several tens of milliVolts at the output. If this offset voltage is applied instantaneously at the output it will produce a loud low-pitched ‘plop’ noise.

An effective way to avoid offset related startup noise is to apply the offset in a gradual way. This can be done by slowly increasing the gain of the VI-converter from zero to full. In this way the plop noise becomes inaudible. A straightforward

**Fig. 11** Linear VI converter with gain control



way to implement gain control in a linear VI-converter is to use gain-cell  $Q_3 - Q_6$  as shown in Fig. 11. The gain control is also used to mute the amplifier.

Offset related startup noise is not exclusive for class-D amplifiers but occurs in other amplifier classes as well. There are however, also startup noise mechanisms that are exclusive for class-D.

First, when the amplifier is started, the initial condition of the integrators in the loop is undefined and usually not even near the steady state region. Therefore, the loop needs some time to settle. Since the class-D output stage needs to be active during this settling this can lead to audible noises in the loudspeaker. Ideally, the output of the amplifier would produce a stable square-wave signal with 50% duty-cycle directly after startup. A straightforward improvement would be to reset the integrators by short-circuiting the capacitors before startup but this is not sufficient. As can be seen in Fig. 10(a), in steady state the integrators are never zero at the same time.

A better solution is to use a secondary feedback loop. As explained earlier, the signal that is fed back to the virtual ground of the first integrator in the loop is a square-wave current  $I_{PWM}$  whose amplitude is proportional to the supply voltage. This current can be mimicked by a switchable current source  $I_{SILENT}$  that is controlled by the same comparator output as used to control the class-D power stage. This results in the system shown in Fig. 12. The system operates as follows. As long as the class-D power stage is not enabled, no current is fed back through resistor  $R_1$ . In this case the switched current source is enabled and a current  $\pm I_{SILENT}$  is fed back to the virtual ground. As far as the loop is concerned this situation is equivalent to the situation where the class-D power stage is enabled and the switched current source is disabled as long as  $I_{SILENT}$  equals  $I_{PWM}$ . Consequently, the loop converges to steady state within a few clock cycles and then the switched current



**Fig. 12** Silent feedback loop

source  $I_{SILENT}$  is disabled and, simultaneously, the class-D power stage is enabled. This secondary feedback configuration is called the silent loop since it operates only if the class-D power stage is disabled.

A similar startup noise mechanism is linked to the demodulation filter. Before startup the output of the class-D output stage is usually in a high-ohmic state. Even if the class-D amplifier is able to produce a perfect PWM signal with 50% duty-cycle directly after startup this still produces a response in the loudspeaker because the demodulation filter needs to settle to a new steady-state. The only degree of freedom that remains to minimize that response is the phase where the class-D output stage is enabled.

In order to determine the optimal starting phase consider the setup shown in Fig. 13. Here the voltage source  $V_{OUT}$  generates a square wave with frequency  $f_{PWM}$  much higher than the cutoff frequency  $f_o$  of the LC filter. Initially, switch  $S$  is open and no energy is stored in the filter, i.e. both inductor current  $I_L$  and capacitor voltage  $V_C$  are zero.

When switch  $S$  is closed and the filter has settled to steady state the inductor current  $I_L$  and capacitor voltage  $V_C$  change periodically. The energy stored in the inductor and capacitor is expressed as:



**Fig. 13** Equivalent circuit during startup

$$E_{LC} = \frac{1}{2}L \cdot I_L^2 + \frac{1}{2}C \cdot V_C^2$$

In steady state the total energy stored in the filter also changes periodically. The optimal time to close the switch is in the phase of the source signal where the stored energy reaches the minimal value. In the stopband of the filter the energy stored in the components in an LC-filter decreases rapidly from source to load and is dominated by the element nearest to the source. In a second order filter the inductor dominates the total stored energy. Therefore the stored energy is minimal when the inductor current  $I_L$  is zero as illustrated in Fig. 14.

In this example  $L = 22 \mu H$ ,  $C = 680 nF$ ,  $f_{PWM} = 325 kHz$  and the amplitude of  $V_{OUT} = 30 V$ . The top graph shows the steady-state inductor current  $I_L$  and capacitor voltage  $V_C$ . The bottom graph shows the PWM signal  $V_{OUT}$  and the instantaneous energy  $E_{LC}$  that is stored in the filter. As can be seen the inductor current  $I_L$  becomes zero twice in each period almost exactly halfway between two edges of the PWM signal. Consequently, the optimal startup moment is at 1/4 or 3/4 of the PWM period. This reasoning also applies to higher order demodulation filters.

The optimal startup moment halfway between edges of the PWM output signal can be readily derived using the available signals inside the feedback loop shown earlier in Fig. 10. As can be seen the output signal of the first integrator  $V_P$  crosses zero exactly halfway between two edges of the PWM output signal. Since the silent loop is equivalent to the main loop the internal signals in silent mode are identical to



**Fig. 14** Optimal startup moments



Fig. 15 Simulated startup responses (a) using integrator reset (b) using silent loop

those in normal mode. An additional comparator can easily detect the zero crossings of the first integrator.

In Fig. 15 the effect of the silent start is demonstrated. For comparison, Fig. 15(a) shows the case that the integrators are reset until the startup moment. As can be seen the feedback loop settles within a few clock cycles but a significant post-filter response is produced. In Fig. 15(b) the silent loop is used prior to startup yielding a minimal post-filter response.

In Fig. 16 the measured startup response of a class-D amplifier IC is shown. As can be seen there still is a small settling response across the load. This is mainly caused by the delay mismatch between the silent feedback and the main feedback



Fig. 16 Measured startup response using silent loop

that causes a slight deviation from the optimal startup timing of the class-D power stage. The startup response that remains is audible as a soft high-pitched ‘tick’.

When a BTL configuration is used then the startup noise problem is basically the same as for the SE configuration if AD-modulation is used. However, if BD-modulation is used then startup noise is much less sensitive to characteristics of the switching pattern at startup as long as both bridge halves produce identical startup patterns. In this case no differential mode voltage appears across the load and hence no plop is perceived.

## 4.2 Clipping Recovery

A common requirement for audio amplifiers is that the output can be driven rail-to-rail. As a matter of fact it is customary to specify the output power of audio amplifiers at 10% distortion, meaning that the output signal is clipping about 40% of the time. When the output is clipping the feedback loop loses control over the signal and integrators inside the loop diverge from their operating point. When the output signal returns to normal the feedback loop needs some time to recover.

To examine this in more detail, consider again the feedback loop shown earlier in Fig. 9. By definition the duty-cycle of the output signal is limited between 0% and 100%. This also puts a limit on the input signal that can be amplified linearly. If the input signal exceeds this limit the triangular signals  $V_P$  and  $V_M$  no longer intersect and diverge in opposite directions. Figure 17 shows the integrator voltages  $V_P$  and  $V_M$  in case an increasing negative input signal is applied.

This situation occurs when the magnitude of the current from the input VI converter  $I_{IN}$  exceeds the magnitude of the feedback current  $I_{PWM}$  through the feedback resistor  $R_1$ . The class-D output stage stops switching and remains low as long as the input signal is too large. In a practical realization the outputs of the integrators cannot diverge indefinitely but are limited to the supply voltage. When the input signal is decreased the signals  $V_P$  and  $V_M$  return to normal operation. In Fig. 18 the



**Fig. 17** Integrator signals  $V_P$  and  $V_M$  and comparator output **pwm** going from zero to negative clipping



**Fig. 18** Clipping recovery: (a) integrator outputs (b) post-filter output  $V_{LOAD}$

integrator signals  $V_P$  and  $V_M$  and the post-filter output signal  $V_{LOAD}$  for a clipping signal are shown. Note that for clarity of the figure a different time scale is used for the graphs. As can be seen in Fig. 18(a) it takes some time for the signals  $V_P$  and  $V_M$  to return to normal operation and also the loop needs some time to settle. This results in typical ‘sticking’ behavior at the output followed by a second order response as shown in Fig. 18(b). This clipping recovery behavior is audible and is perceived as a ‘ticking’ noise that is clearly different from the ‘normal’ distortion that results from clipping.

The clipping recovery behavior can be improved significantly if the integrators are prevented from diverging during clipping. To do so the onset of clipping needs to be detected and corrective action needs to be taken to keep the integrators near their steady-state values.

Detection of clipping in the class-D feedback loop is fairly simple. As can be seen in Fig. 10, during normal operation a fixed sequence exists in the transitions of the reference clock signal *osc* and the comparator output signal *pwm*. A rising edge of the *osc* signal is always followed by a rising edge of the *pwm* signal and a falling edge of the *osc* signal is always followed by a falling edge of the *pwm* signal. A deviation from this sequence can be detected by a simple asynchronous logic circuit. The state transition diagram of such a circuit is shown in Fig. 19.



**Fig. 19** State transition diagram of clip detection logic

As long as the signals **osc** and **pwm** follow the correct sequence the circuit cycles through the states  $S_0$ ,  $S_1$ ,  $S_2$  and  $S_3$ . If a transition of the **osc** signal is followed by an opposite transition of the **osc** signal instead of the expected transition of the **pwm** signal the circuit jumps to state  $S_{1a}$  or to state  $S_{3a}$  and the output signal **clip** goes high to indicate that the loop is clipping. When the loop returns to normal operation the sequence of the **osc** and **pwm** signal is resumed and the circuit cycles through state  $S_0$ ,  $S_1$ ,  $S_2$  and  $S_3$  again.

During clipping the integrators diverge and do not cross zero anymore as can be seen in Fig. 18(a). Forcing the output of both integrators to cross zero each clock cycle can prevent this. For the first integrator this can be achieved by temporarily interrupting the current from the VI-converter  $g_{m0}$ . This will automatically cause the output  $V_P$  of the first integrator to change direction and return towards zero.

After a comparator detects a zero crossing the current from VI-converter  $g_{m0}$  is resumed. This system is shown schematically in Fig. 20. Note that a comparator to detect zero crossings of the first integrator was already included in the loop to build the silent start system described earlier. Also the mute feature that is built into the VI-converter can be reused to implement the interruption switch  $S_1$ .

For the second integrator a zero crossing can be forced by manipulation of the edges the **osc** signal. As can be seen in Fig. 21 the negative peaks of triangular wave  $V_M$  stop crossing zero when positive clipping starts. A zero crossing can be forced when the edge of the **osc** signal is delayed. As can be seen in Fig. 21, this causes



**Fig. 20** Forcing zero crossing of first integrator



**Fig. 21** Integrator output  $V_M$  and **osc** with and without edge delay

**Fig. 22** Zero crossing detection of second integrator



**Fig. 23** State transition diagram for edge delay logic



the falling slope of  $V_M$  to become longer until it crosses zero. The zero crossing is detected by an additional comparator  $A_2$  as shown in Fig. 22.

An asynchronous logic circuit can be used to delay the  $osc$  edges accordingly. The state transition diagram of such an asynchronous logic circuit is shown in Fig. 23. The inputs of the edge delay logic are the  $osc$  signal and  $\text{sign}2$  signal. The output is a signal  $\text{oscout}$  that is high in states  $S_1$  and  $S_2$  and low in states  $S_0$  and  $S_3$ .

Under normal operating conditions the signal  $V_M$  is positive ( $\text{sign}2 = \text{high}$ ) during rising edges and negative ( $\text{sign}2 = \text{low}$ ) during falling edges of the  $osc$  signal. Consequently the edges of the output signal  $\text{oscout}$  coincide with the edges of the  $osc$  signal because the delay logic is always in state  $S_0$  during rising edges of and in state  $S_2$  during falling edges of  $osc$ . Now if for example the integrator output  $V_M$  is positive ( $\text{sign}2 = \text{high}$ ) during a falling edge of the  $osc$  signal as is the case in Fig. 21, the delay logic is stuck in state  $S_1$ . As soon as  $V_M$  crosses zero the delay logic jumps to state  $S_2$  and then immediately to state  $S_3$  thus causing  $\text{oscout}$  to go low. In this way the falling edge of  $\text{oscout}$  does not coincide with the falling edge of  $osc$  anymore but is delayed until a zero crossing of  $V_M$  is detected.

The resulting clipping behavior for the class-D amplifier is shown in Fig. 24. As can be seen the integrator outputs remain close to zero during clipping and the sticking and second order response in the post-filter output have vanished completely. Figure 25 shows post-filter output signals measured on a class-D amplifier



**Fig. 24** Improved clipping recovery: (a) integrator outputs (b) post-filter output  $V_{LOAD}$



**Fig. 25** Measured clipping behavior: (a) without clipping recovery system (b) with clipping recovery system

without and with the clipping recovery system active. As can be seen the measured signals are in good agreement with the simulations.

## 5 Conclusion

Robustness is arguably the most important requirement for a high power class-D audio amplifier. A failure in the assembly line or, even worse, at the end customer involves high costs associated with failure analysis and repair. On the other hand, for reasons of economy it is necessary to design as close to the edge as possible since headroom in the design usually translates directly to chip area and thus cost.

Second on the list of requirements is the absence of audible artifacts. Both robustness and audible artifacts relate directly to the perception of quality of both equipment manufacturers and end customer.

## References

1. K. Nielsen, "A review and comparison of pulse width modulation (PWM) methods for analog and digital switching power amplifiers", 102nd Conv. AES, 1997.
2. M. Berkhout "Clock Jitter in Class-D Audio Power Amplifiers", *Proceedings ESSCIRC 2007*, pp. 444–447, September 2007.
3. C. Neesgaard, et al. "Class D Digital Power Amp (PurePath Digital<sup>TM</sup>) High Q Musical Content", *Proceedings ISPSD 2004*, pp. 97–100. 2004.
4. P. van der Hulst, A. Veltman and R. Groeneweg, "An asynchronous switching high-end power amplifier" 112th Conv. AES, 2002.
5. M. Berkhout, "An integrated 200-W Class-D audio amplifier", *Journal of Solid-State Circuits*, vol.38, No.7, pp. 1198–1206, November 2003.
6. M. Berkhout, "Integrated Overcurrent Protection System for Class-D Audio Power Amplifiers", *Journal of Solid-State Circuits*, vol. 40, No. 11, pp. 2237–2245, November 2005.
7. P. Wessels, et al. "Advanced BCD technology for automotive audio and power applications", *Solid-State Electronics*, No. 51. pp. 195–211, 2007
8. P. Morrow, E. Gaalaas and O. McCarthy, "A 20-W stereo class-D audio output stage in 0.6 mm BCDMOS technology", *Journal of Solid-State Circuits*, vol. 39, No. 11, pp. 1948–1958, November 2004.
9. M. Berkhout, "A Class-D Output Stage with Zero Dead Time", *ISSCC Dig. Tech. Papers*, pp. 134–135, Februari 2003.
10. F. Nyboe, et al., "A 240 W Monolithic Class-D Audio Amplifier Output Stage", *ISSCC Dig. Tech. Papers*, pp. 1346–1355, Februari 2006.

## Part III

# Power Management

The last chapter of this book covers Power Management. With the increasing use of portable devices and more focus on energy savings and environmental friendliness, power is now an important aspect of current electronic system design. This has resulted on the one hand in low power design techniques for high speed and large complexity functions, where chip heating is a real limitation and in advanced power management techniques for the supply side of these low power systems on the other hand. Power management circuits are now broadly used and, dependent on the application, have very different requirements such as high efficiency, high accuracy and low noise, low bill of materials and area, very low standby power, high robustness, high level of integration and flexibility.

The first paper deals with single inductor, multiple output DC-DC convertors. This is a new area in the field of power management where the current of a single inductor is time shared between multiple outputs. Massimiliano Belloni et al. first describe different current switching schemes and the extra switches needed for boost, buck and buck-boost converters. A pragmatic and an analytical control loop strategy is then presented. The driving of the power switches is a special problem in multiple output converters. Dynamic well biasing, complementary switches and a self-boosted snubber are described. Finally, design examples of a two output and a four output DC-DC converter are given.

An enhanced ripple regulator, which improves DC regulation, reduces noise sensitivity and helps avoiding fast-scale instability is the subject of the next paper. Richard Redl first describes traditional ripple regulators: hysteretic, constant-on-time, constant-off-time, constant-frequency and Vsquare architectures. An enhanced ripple regulator with combined voltage-error and ripple amplifier is then proposed and its compensation is described. A further improvement is obtained by the ramp pulse modulation technique, which reduces the unloading overshoot of a constant on-time ripple regulator. An implementation of this converter is described.

The third paper of Ivan Koudar addresses the specific requirements for a robust DC-DC converter for automotive applications. From the different possible topologies, peak and average current mode control shows the better rejection of fast battery transients and implicit current limitation. Despite the higher complexity, average current mode is chosen for its better noise immunity. This has been implemented in a buck-boost converter for a gateway SOC. The complex regulation loop and

several circuits such as static and dynamic current sensing and limitation, mode control, power stage control charge pump and startup are then described in detail. The fabricated silicon proves the selected concepts.

A highly integrated power management IC in an advanced CMOS process technology is described by Mario Manninger in the fourth paper. Portable devices combine more and more functions with very different supply requirements, such as intelligent battery chargers, high current buck converters for digital and baseband functions, low-dropout regulators for RF and audio codec functions, buck-boost and hybrid converters for flash memory and boost converters for lighting management. The paper describes the different architectures, the converter and battery models used in the design and the resulting complex power management IC.

The fifth paper of Lázaro Marco et al. provides a review of adaptive power management techniques for power demanding loads in portable devices. Envelope tracking for polar RF power transmitters and on-chip line drivers for power line communications are two examples of such loads, which require wideband efficient power amplification. The system aspects and design challenges of the former are first described. Different topologies for wide-band efficient amplification such as a three-level buck converter and a linear-assisted switching power amplifier are then elaborated. Finally, two alternative modulation and control techniques are discussed in the last part of the paper.

The last paper, of Vadim Ivanov focuses on the limitations of low-dropout voltage regulators. Traditional LDO circuits require a minimum or maximum load capacitance and have a slow reaction time to a load step. By applying the structural design methodology to the embedded LDO circuit, any-load stable regulators with very fast load regulation, low noise and large power supply rejection could be developed. The paper describes first the structural design methodology with flow graphs, dedicated feedback control and a library of elementary cells. This methodology is then applied on LDO regulators with very different requirements such as a very low quiescent current memory retention LDO, a low-noise LDO for radio functions and an instant load step regulator for digital circuits.

# Single-Inductor Multiple-Output Dc-Dc Converters

Massimiliano Belloni, Edoardo Bonizzoni and Franco Maloberti

**Abstract** This chapter deals with the design methodologies to obtain DC-DC converters with multiple outputs and only one inductor. The four possible schemes, buck, boost, and inverting or non-inverting buck-boost, are considered. The key specific problems related to the issue are the inductor current switching scheme, the multiple-loops control, the suitable driving of power switches and, accordingly, the converter power efficiency. All the issues are discussed in details. Moreover, design examples of devices integrated with CMOS technologies and the experimental results are presented.

## 1 Introduction

The continuous market growth of battery-operated systems and the need of optimizing the power consumption in multi-processors by a dynamic regulation of the supply voltage expand significantly the portable power management market. In addition to conventional DC-DC devices, [1], there is an increasing need of DC-DC converters capable to generate many outputs while using a single inductor. The reason is that when on the same system it is required to generate multiple supply voltages, the increased PCB area, the augmented number of components, and the reduced reliability for the many inductors used, becomes problematic, [2, 3]. The new devices discussed here that can be boost, buck or buck-boost, give the solution. To have multiple outputs it is necessary to time-share the inductor current between various loads. For this, the feedback loop that regulates the voltage becomes a multi-feedback loop with probable problems of stability and possible ringing of the outputs. Moreover, for buck architectures, it is necessary to use extra power switches. In addition, for these kinds of converters, the switches must separate voltages that can have a significantly different value. Therefore, the power switch drivers must account for problems that are specific of the multiple output function.

---

M. Belloni (✉)

University of Pavia, Department of Electronics, Via Ferrata, 1 - 27100 Pavia – Italy  
e-mail: massimiliano.belloni@unipv.it

## 2 SISO DC-DC Switching Regulators

Figure 1 shows the four conventional Single-Inductor Single-Output (SISO) DC-DC switching regulators, [1].



**Fig. 1** SISO DC-DC switching regulators conventional topologies

The switches properly charge and discharge the inductor with non-overlapped phases that are a fraction of the switching period. A control loop, not shown, ensures that the output voltage is as close as possible to the setting. Figure 1(a) and 1(b) are the conventional buck and boost architectures, while the inverting and non-inverting buck-boost schemes are the ones shown in Fig. 1(c) and 1(d), respectively. Transforming the four schemes of Fig. 1 into multiple output versions requires duplicating the switch on the load side for the inductor current time-sharing. For the scheme of Fig. 1(a) it is necessary to add extra switches on the load side because on that point there is no switch that the DC-DC operation foresees.

## 3 Inductor Current Switching Schemes

A DC-DC switching regulator with multiple outputs time-shares the inductor current among different loads. The sharing strategy depends on the operation that can be in the discontinuous mode, with clamping at zero of the current, or in the continuous mode, [1]. For a double boost scheme, [4, 5], like the one shown in Fig. 2, there are three time slots: one used to charge the inductor and two used to discharge the inductor into the two loads. For discontinuous mode there are mainly two options: a time interleaved operation with one clock period fully dedicated to one load and the next clock period to the other (Fig. 3(a)) or the configuration shown in Fig. 3(b). Figure 3(a) shows that the time slots to charge the inductor are different during the even and odd periods ( $T_{A1}$ ,  $T_{B1}$ ). On the contrary, Fig. 3(b) shows the use of just one period to charge the inductor followed by the sequence of two slots to deliver the

**Fig. 2** Single inductor double output DC-DC boost converter



**Fig. 3** Inductor current in the two-outputs boost converter (discontinuous conduction mode)

power to the loads A and B. The advantage of the method of Fig. 3(a) is that there is no cross regulation between the two outputs. However, the second method can use a lower switching frequency, as the output voltages ripple is lower. The two methods can be extended to more outputs and, possibly, hybrid methods can be used, with time interleaving of pair of output loads served on the same switching period. In addition, the switching scheme of Fig. 3(b) is suitable for continuous conduction mode.

For buck converters a two-outputs scheme (Fig. 4), [6], is here considered, but the solution can be extended to more output loads, [7]. In this case, the conventional buck switches  $M_1$  and  $M_2$ , and the current sharing ones,  $S_1$  and  $S_2$ , have to be controlled. The two switching times are unrelated since the switching between the  $M$  transistors can occur before or after the switching of the  $S$  transistors. The currents in the inductor are like the profiles shown in Fig. 5(a) and (b).

In a more complex situation, with four outputs, the switching times and the current in the inductor looks like the diagram of Fig. 6, [7, 8].

Remind that for the switching of transistors  $M_i$  and  $S_i$  it is necessary to ensure a proper disoverlap to avoid short connections.

For the switching current scheme of Fig. 6 there is the problem of cross regulation because the increase in the time slot of one load affects the power delivered to the other loads and the control system must be able to handle the problem.



**Fig. 4** DC-DC SIDO buck converter



**Fig. 5** SIDO buck output branches currents ( $I_A$ ,  $I_B$ ) in the two cases  $D < D_1$  (a) and  $D > D_1$  (b)



**Fig. 6** 4-output SIMO buck inductor and output branches currents ( $I_A$ ,  $I_B$ ,  $I_C$ ,  $I_D$ )

Figure 7 shows a two output buck-boost converter. The switches connected to the output loads in addition to the normal operation of the buck-boost also work for the current sharing.

The switching scheme is the same as the boost circuit with the control of M2 made by the OR function of the controls of  $S_A$  and  $S_B$ .



Fig. 7 DC-DC SIDO buck-boost converter



Fig. 8 SIDO buck-boost main switches currents ( $I_{M1}, I_{M2}, I_{M3}$ ) (a) and output branches currents ( $I_A, I_B$ ) (b)

#### 4 Multiple-Loop Control Strategy

Figure 9 shows the conventional control loop. Its operation is well known: the output voltage is subtracted to the voltage setting to obtain the output error,  $\epsilon$ . The error is amplified and used as threshold of a saw-tooth signal whose period is the inverse of the switching frequency. The resulting pulse and its complement are used to control the power switches.

For  $N$  outputs it is necessary to foresee  $N$  control loops whose output determines the switching times discussed in the previous Section. One of those times drives the switches for the conventional operation and the other switches on the loads side. The control strategies can be “pragmatic” or “analytical”. The first type is driven by logic considerations the second one is an extension, albeit not completely investigated, of the theory used in single output schemes.

The first method is presently used in boost schemes because it is more difficult to generalize the single output method. Here, two approaches are reported. What is used in one of the design examples presented in the following of this chapter is pragmatic and it is based on the following observation. The highest error indicates the power needed by the entire converter. Therefore the switching of  $M_2$  is controlled by this highest error. The current time sharing is defined by considering one output



**Fig. 9** Switching regulator closed loop control system

as “master” and the other as “slave”. The output of the master is measured and when it reaches the setting the current is delivered to the slave.

The other approach, [8], identifies suitable variables  $\mathbf{E}$  as input of a feedback control. The problem is to find a vector  $\mathbf{T}$  of  $N$  time instants as a function of the input  $\mathbf{E}$ :

$$\mathbf{T} = f(\mathbf{E}) \quad (1)$$

The used solution foresees a linear processing of  $\mathbf{E}$ , leading to a linear system of equations:

$$\left\{ \begin{array}{l} T_1 = a_{11}\varepsilon_1 + a_{12}\varepsilon_2 + \dots + a_{1N}\varepsilon_N \\ T_2 = a_{21}\varepsilon_1 + a_{22}\varepsilon_2 + \dots + a_{2N}\varepsilon_N \\ \dots \\ T_N = a_{N1}\varepsilon_1 + a_{N2}\varepsilon_2 + \dots + a_{NN}\varepsilon_N \end{array} \right. \quad (2)$$

It is evident that it is convenient to use as vector of the input variables the errors  $\mathbf{\varepsilon}_i = (V_{outi} - V_{seti})$ . Therefore:

$$\mathbf{T} = k\mathbf{A}\mathbf{\varepsilon} \quad (3)$$

where  $k$  denotes the overall gain loop.

The various used solutions differ because of the utilized matrix  $\mathbf{A}$ . As an example, the following matrixes can be considered:

$$\mathbf{A} = \begin{bmatrix} 1 & 0 & \dots & 0 & 0 \\ 0 & 1 & \dots & 0 & 0 \\ 0 & 0 & \dots & 0 & 0 \\ 0 & 0 & \dots & 1 & 0 \\ 0 & 0 & \dots & 0 & 1 \end{bmatrix} \quad (4)$$

that is using a single error for the control of one of the times.

$$\mathbf{A} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & 1 & 1 & -1 \end{bmatrix} \quad (5)$$

with the first line used to control the main switching. This solution has proved better performance for buck converters with  $N > 2$ .

For boost configurations, it is also possible to use the analytical method. It is required to use coefficients that ensure the condition

$$T_i > T_1, i > 1 \quad (6)$$

where  $T_1$  is the charging period.

## 5 Power Switches Driving

As mentioned above, in multiple-output DC-DC converters, an important design issue is the power switches driving strategy as it affects the overall system effectiveness, in terms both of area and power.

In any single or multiple outputs DC-DC converter, the switch connected to the battery and the one connected to ground are obviously made by p-channel and n-channel devices, respectively. By contrast, the switches on the load side can be problematic. There are three possibilities: use of a p-channel, an n-channel or a complementary switch. The choice depends on the expected regulated voltage, and the cost-efficiency trade off. If the regulated voltage is relatively large, much higher than the transistor threshold, then the use of a p-channel is a good solution: the overdrive is enough and the series conductance,  $G_{on}$ , caused by the extra switch can become affordable with a reasonable transistor aspect ratio

$$G_{on} = \mu C_{ox} \frac{W}{L} (V_{out} - V_{Th}) \quad (7)$$

For a modern power technology, the parameter  $\mu C_{ox}$  is in the  $100\text{--}200 \mu\text{A/V}^2$  range. Therefore, a switch resistance of  $1 \Omega$  is obtained with  $W/L$  of in the order of 3500, that is large but acceptable. As known, the threshold voltage changes because of the body effect and, in order to cancel it, it is necessary to connect the substrate to the source. This is admitted with a single output and n-well technologies, but is not possible with multiple outputs because of the possibility of having the terminal connected to the inductor at a voltage that is the higher than the switched output. The limit is significant because the body effect can worsen the threshold by  $100\text{--}200 \text{ mV}$  and with  $500 \text{ mV}$  of overdrive the series resistance becomes 25% and 66% higher,



**Fig. 10** Schematic diagram of the bulk biaser circuits

respectively. The solution to the problem is to tight the well to the higher voltage between the switched terminals, [6].

The technique of dynamic biasing of the well is illustrated in Fig. 10 for a two outputs buck converter. The circuits *BB1* and *BB2* dynamically bias the bulk of transistors  $M_2$  and  $M_3$  by two n-channel switches controlled by a logic signal and its inversion. The logic signal is generated by a simple comparator of the voltages between the switching nodes:  $IND+$  and the output voltages  $V_{OUT1}$ ,  $IND+$  and  $V_{OUT2}$ , respectively. Therefore, the body effect is cancelled during the on-phases, thus improving the converter power efficiency. Notice that a possible offset of the differential pair is not critical because problems arise when an incorrect substrate biasing is in the order of several tens of mV.

Another possible solution is to use complementary power switches but there are limits: the silicon area is almost doubled and the power required to charge and discharge the gate of the power transistors significantly increased. Therefore, complementary switches can be used only for applications with very low current for which the sizing of the power switches is not an issue.

The other possible solution is to use an n-channel transistor that for being properly opened requires a voltage higher than the supply. As known, the request can be satisfied by charge pumps, [9]. The switching of one or more pumping capacitances enables reaching the high voltages as required by non-volatile memories. However, in the case of multiple output boost schemes, the gate capacitance of the power transistor can be as high as 15 pF and the corresponding charge that must be provided leads to area and efficiency issues.

A different approach, named self-boosted snubber is shown in Fig. 11, [7]. The method exploits the fact that it is necessary to ensure disoverlap between the switching between loads. During the disoverlap periods the inductor current must be properly handled as it is done with the snubber. In the circuit of Fig. 11, the current is used to boost the voltage across  $C_{snub}$  through a diode, D. The charge stored on the capacitor is the tank that enables switching on the power n-MOS,  $S_A$ . The switching is controlled by two transistors: the n-type is to switch off and the p-type to switch on  $S_A$ . In order to ensure that during the off state the control of the



**Fig. 11** Schematic diagram of the self-boosted snubber circuit

p-channel is equal or higher than the boosted voltage, a low-power charge pump (*cp* in Fig. 11) suitably augments the supply voltage. Therefore, the current during the disoverlap periods is used to store the energy necessary to drive the power MOS.

The switching of the power MOS in multiple output boost converter has the double problems of protection of the driver that normally is designed with a low-voltage option of the technology and the separation between the high voltage outputs. The simplest way to avoid short circuits and ensure good isolation of the outputs is to use diodes (possibly external) in addition to suitable drivers. Since Schottky diodes ensure a low drop, they are the optimum solution. Figure 12 shows a scheme that allows using low voltage devices to control transistors capable to sustain high voltage (drain extension), [5]. The current source  $M_1$  and the resistor  $R_1$  generate a proper bias used to clamp the on-control of the power p-MOS. The transition from the on to the off states is made fast thanks to transistor  $M_2$  that discharges quickly the  $C_{GS}$  of the power p-MOS. Low voltage devices are protected by the transistors with drain extension  $M_{P1}$ ,  $M_{P2}$  and  $M_{P3}$ .



**Fig. 12** Possible driver of the power switch in boost DC-DC

## 6 Design Examples

The design techniques discussed in the previous Sections are the basis of design and implementation of integrated DC-DC SIDO and SIMO converters. The features and the experimental results of three of them are here discussed.

Figure 13 shows the block diagram of a two-output boost converter for backlight applications realized in a 130-nm CMOS technology, [5]. The circuit is capable to generate two independent voltages up to 11 V with a maximum over current of 28 mA.

Realizing a dual output DC/DC converters for backlight applications requires an independent control of output voltages and currents as certain lighting functions require different light intensity and also the feature of selectively turning on or off the backlight sources.

The circuit uses the control scheme employing the “1-2A-2B” current switching method with a pragmatic control like the one previously described. The power p-MOS switches are with drain-extension and can sustain a drain-source voltage ( $V_{DS}$ ) up to 20 V. The external Schottky diodes prevent reverse currents in any conditions of operation. Figure 14 shows the layout of the chip with some details of the floor plan.

Figure 15 illustrates the power n-MOS drain voltage and two steady-state output voltages both at about 7 V and supplying 5 mA and 30 mA, respectively. The drain voltage clearly shows that “1-2A-2B” control scheme with output B (high voltage) biased first and then followed by output A. The subsequent ringing is caused by the inductance and parasitic capacitance.

Figure 16 illustrates the power efficiency of the converter versus output voltage with both outputs at 28 mA. It also shows that as the output voltage decreases to about 4.5 V, efficiency converges to the same number for the 28-mA and 10-mA case, whereby the power loss are dominated by switching loss.



**Fig. 13** Schematic diagram of a SIDO boost



Fig. 14 Layout of the SIDO boost converted



Fig. 15 Significant waveforms of the SIDO boost

The second design example is a two-output buck, [6]. Its basic scheme is shown in Fig. 17. The loop control is a simple diagonal coefficient matrix: the control of the main switching is done with one of the errors and the time-sharing with the other.



**Fig. 16** Efficiency of the SIDO boost



**Fig. 17** Circuit diagram of the proposed SIDO buck

The PWM generator uses the same waveform (in this case a triangular wave) to obtain both the control phases. The switching frequency is 1 MHz and the gain of the error ensured by  $A_1$  and  $A_2$  is approximately 45 dB. Switches  $S_1$  and  $S_4$  are obviously realized with n-channel and p-channel devices, respectively. Since it is assumed that the regulated voltage is relatively high,  $S_2$  and  $S_3$  are p-channel devices. The sizes of  $M_1$ ,  $M_2$ , and  $M_3$  are equal to  $18000/0.6 \mu\text{m}$ , while NMOS transistor  $M_4$  is  $6000/0.6 \text{ g}\mu\text{m}$ . These aspect ratios come up from a trade-off between the on-resistance of the switches, the waste of silicon area, and the dynamic

power consumption. The used channel length (about twice the minimum) minimizes transistor leakage currents and ensures proper ESD protection.

The circuit achieves the optimum substrate biasing by the dynamically switch between the voltages across the switch terminals, as discussed in the previous section. The single-inductor dual-output buck converter has been fabricated in a  $0.35\text{-}\mu\text{m}$ ,  $5 - \text{V}$  transistor option, p-substrate, 2-poly, 3-metal levels CMOS technology. Figure 18 shows the chip microphotograph. The chip area is  $1350\text{ }\mu\text{m} \times 1800\text{ }\mu\text{m}$ , including pads. The power transistor area is about  $0.22\text{ mm}^2$ . In order to minimize the resistance and the inductance of the bonding wires, a triple bonding approach has been adopted for the drain and the source terminals of each power transistor. The used off-chip inductor and storage capacitors, referred to as  $L$ ,  $C_1$ , and  $C_2$ , respectively, are  $22\text{ }\mu\text{H}$  and  $35\text{ }\mu\text{F}$ , respectively.

Figure 19 shows the ripple in the measured output voltages. With a power supply of  $3.6\text{ V}$ , the voltages  $V_{OUT1}$  and  $V_{OUT2}$  are  $3.3\text{ V}$  and  $1.8\text{ V}$ , respectively. The achieved voltage ripple is  $31\text{ mV}$  for  $V_{OUT1}$  and  $24\text{ mV}$  for  $V_{OUT2}$ , with output currents of  $56\text{ mA}$  and  $40\text{ mA}$ , respectively.

Figure 20 shows the cross-regulation of the output voltages, with one output fixed at  $3.3\text{ V}$  ( $V_{OUT1}$ ) and the other ( $V_{OUT2}$ ) changing by  $680\text{ mV}$ , from  $1.42\text{ V}$  to  $2.1\text{ V}$ . The increase of the current on the second load from  $22\text{ mA}$  to  $33\text{ mA}$  does not affect  $V_{OUT1}$  at all.

The measured power efficiency is good, as shown by Fig. 21. By keeping the power supply set to  $3.6\text{ V}$  and the output current on the second load equal to  $40.2\text{ mA}$ , the power efficiency, measured as a function of the first output current, reaches  $93.3\%$  when both output voltages are set to  $3.3\text{ V}$  and the overall output current is  $124.8\text{ mA}$ . When the output voltages are set to  $3.3\text{ V}$  and  $1.8\text{ V}$ , respectively,



**Fig. 18** Chip microphotograph



Fig. 19 Measured output voltages



Fig. 20 Measured step response ( $V_{OUT1} = 3.3 \text{ V}$ ,  $V_{OUT2} = 1.42 \text{ V} - 2.1 \text{ V} = 1.42 \text{ V}$ )



Fig. 21 Measured power efficiency ( $V_{dd} = 3.6\text{ V}$ ,  $I_{out2} = 40.2\text{ mA}$ )

the power efficiency reaches 85.2% when the overall output current is 190 mA. The power efficiency is anyway always higher than 62.5%.

The third design example is a buck DC-DC converter with four outputs voltages, [7]. The control subsystem, together with the drivers, provides the four control signals. By using the complex control scheme described by equation (5).

Figure 22 shows the processing block diagram including the PWMs output pulses.  $H(s)$  in the main path is a first-order zero-pole filter that achieves the loop compensation, while  $A$  blocks in the sharing paths are just amplifiers. The main path, driven by  $H(s)X_1$ , controls the main switches  $MP$  and  $MN$ , while the other paths, driven by  $AX_2$ ,  $AX_3$ , and  $AX_4$ , manage the sharing of the inductor current, thus determining the four time-sharing slots.

The analog processor is realized with a switched-capacitor circuits that achieves the error combinations given by equations (5) as well as the other functions. Figure 23 shows the detail of the first processing channel, which consists of three sections. The first section combines the errors and provides a gain equal to 5, while the second section is the zero-pole switched-capacitor filter. The branch including  $C_5$  and  $V_{bias}$  achieves a DC level shift. Finally, the flip around double sample-and-hold decouples the filter from the PWM, thus limiting the kickback from the switching part and eliminating the glitches produced by the switching from phase 1 to phase 2. The other channels have only two sections. One is the amplifier that processes the errors providing a gain of 10 and shifting the DC level, and the other is the sample-and-hold.



Fig. 22 Conceptual scheme of the analog processor and PWMs output pulses



Fig. 23 Analog processor first channel schematic diagram

The driving of the switches of the buck converter is straightforward, since they are connected to  $V_{DD}$  or ground. By contrast, the control of the load switches uses a self-boosted driver as shown in Fig. 24. The internal capacitor is  $C_S = 170 \text{ pF}$  with in parallel and external capacitance  $C_{S1} = 430 \text{ pF}$ . To ensure the proper control of  $M_{Ni} - M_{Pi}$ , the logic signal provided by the analog processor is almost doubled with a charge pump (CP), [10].

The circuit has been fabricated with a  $0.5\text{-}\mu\text{m}$  2-poly, 5-metal CMOS process. Figure 25 shows the chip microphotograph. The total area is  $3.5 \text{ mm} \times 3.8 \text{ mm}$ , with  $1.2 \text{ mm}^2$  used for analog processing.

Figure 26 shows 3 of the 4 output voltages and the switching node voltage waveforms in the steady state. It can be noted from the switching node voltage waveform a good stability of the main loop. The main duty in this case is about 60%.



**Fig. 24** Self-boosted switch drivers schematic diagram

**Fig. 25** Chip microphotograph



Figure 27 shows an output voltage ripple measurements. It is possible to see the four output waveforms in the steady state ac coupled, with a vertical scale of 50 mV. The maximum ripple is about 65 mV.

For cross-regulation measurements, an input filter slows down the transient response of the converter in order to avoid transient cross-regulation drops of the output voltages. However the converter it is pretty fast since it sets in about 80  $\mu$ s.



Fig. 26 Measured output voltages



Fig. 27 Measured ac coupled output voltages

In measurement shown in Fig. 28,  $V_{out_2}$ ,  $V_{out_3}$  and  $V_{out_4}$  are set at their proper voltage level, while  $V_{out_1}$  changes from 0.7 to 1.6 V and vice versa.

Figure 29 shows  $V_{out_4}$ , its gate voltage, the clock signal and the switching node voltage waveform. It can be noted that the self-boosted snubber circuit boost up the



Fig. 28 Cross-regulation measurement



Fig. 29 Measurement of the self-boosted drivers effectiveness



Fig. 30 Power efficiency of the 4-output buck converter

switch gate voltage at 5.5 V clamped by the protection pads. Some  $V_{out4}$  ringing of about 80 mV peak during the load switch commutations has been observed in this condition.

Figure 30 shows the measured power efficiency as a function of the fourth output current with  $I_{out1}$ ,  $I_{out2}$ ,  $I_{out3}$  fixed at 200 mA, 100 mA, and 150 mA, respectively. The supply voltage is at its minimum value of 2.3 V. The efficiency peak is of about 85% at 330 mA. Table 1 summarizes the chip performance.

Table 1 Performance summary

|                                         |                              |
|-----------------------------------------|------------------------------|
| Supply Voltage                          | 2.3 → 5 V                    |
| Output Voltages                         | 0 → ( $V_{supply} - 0.5$ ) V |
| Max Output Voltage                      | 3.6 V                        |
| (Total) Output Current                  | 0.1 → 1.8 A                  |
| (Single) Output Current                 | 0 → 0.8 A                    |
| Max Voltage Ripple                      | 90 mV                        |
| Max Cross-Regulation                    | 40 mV/V                      |
| Max Load-Regulation                     | 45 mV/V                      |
| Peak Efficiency@( $V_{supply} = 2.3$ V) | 85%                          |

## 7 Conclusions

The design of single-inductor multiple-output DC-DC converter is important for future low-power portable systems. The key issues and some possible solutions have been described in this chapter. The provided examples demonstrate the feasibility and the limits of various approaches. Because of the increased diffusion of complex and portable systems is expected that the area will come upon great development in the near future.

## References

1. N. Mohan, T.M. Undeland, and W.P. Robbins, "Power Electronics - Converters, Applications, and Design – Second Edition", John Wiley & Sons, INC., Ch. 7.
2. B. Arbetter, R. Erickson, and D. Maksimovic, "DC-DC converter design for battery-operated systems", *IEEE Power Electronics Specialist Conference*, vol. 1, pp. 103–109, June 1995.
3. V. Kursun, S.G. Narendra, V.K. De, and E.G. Friedman, "Analysis of buck converters for on-chip integration with a dual supply voltage microprocessor", *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 11, pp. 514–522, June 2003.
4. D. Ma, W.-H. Ki, and C.-Y. Tsui, "A pseudo-CCM/DCM SIMO switching converter with free-wheel switching", *IEEE International Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 390–476, Feb. 2002.
5. S.K. Hoon, N. Culp, J. Chen, and F. Maloberti, "A PWM dual-output DC/DC boost converter in a 0.13  $\mu\text{m}$  CMOS technology for cellular-phone backlight application", *Proc. of European Solid-State Circuits Conference (ESSCIRC)*, pp. 81–84, Sept. 2005.
6. E. Bonizzoni, F. Borghetti, P. Malcovati, F. Maloberti, and B. Niessen, "A 200 mA 93% Peak Efficiency Single-Inductor Dual-Output DC-DC Buck Converter", *IEEE International Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 526–619, Feb. 2007.
7. M. Belloni, E. Bonizzoni, E. Kiseliolas, P. Malcovati, F. Maloberti, T. Peltola, and T. Teppo, "A 1.2A Output Current Single-Inductor 4-Outputs DC-DC Buck Converter with Self-Boosted Switch Drivers", *IEEE International Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 444–626, Feb. 2008.
8. M. Belloni, E. Bonizzoni, and F. Maloberti, "On the Design of Single-Inductor Multiple-Output DC-DC Buck Converters", to appear in *Proc. of 2008 International Symposium on Circuit and Systems (ISCAS)*.
9. J. Dickson, "On-chip high voltage generation in NMOS integrated circuits using an improved voltage multiplier technique", *IEEE J. Solid-State Circ.*, vol. SC-11, no. 3, pp. 374–378, June 1976.
10. P. Favrat, P. Deval, and M.J. Declercq, "A new high efficiency CMOS voltage doubler", *Proc. of IEEE Custom Integrated Circuits Conference (CICC)*, pp. 259–262, May 1997.

# Enhanced Ripple Regulators

Richard Redl

**Abstract** The various ripple regulator types have simple control structure, fast transient response, and (in some versions) a switching frequency that is proportional to the load current in discontinuous conduction mode. Those characteristics make the ripple regulators especially well-suited for power-management applications in computers and portable electronic devices. Ripple regulators, however, also have some drawbacks, including (in some versions) a poorly defined switching frequency, noise-induced jitter, a tendency for fast-scale instability, and inadequate dc regulation. This paper focuses on a novel ripple regulator architecture that improves the dc regulation, reduces the noise sensitivity, and also helps avoiding the fast-scale instability. The proposed architecture combines a dual-purpose (error and ripple) amplifier with a traditional (hysteretic, constant-on-time, constant-off-time, or constant-frequency) ripple regulator. The performance can be further improved by the addition of the so-called ramp pulse modulation technique, which substantially reduces the worst-case unloading overshoot in the constant-on-time version.

## 1 Introduction

Dc-dc converters using the output ripple voltage as the PWM ramp (commonly called “ripple regulators” [1]; the control technique itself is sometimes called “ripple-based control” [2]) have simple control architecture and fast line, load, and reference transient responses. The hysteretic and constant-on-time versions also have a switching frequency that changes proportionally with the load in discontinuous conduction mode—this feature is useful in applications where high efficiency must be maintained over a wide range of load currents. The above characteristics make the ripple regulators especially attractive for computer or portable electronics applications. The ripple regulators, however, suffer from some serious practical

---

R. Redl (✉)  
ELFI S.A., Montévaux 14, CH-1726 Farvagny, Switzerland  
e-mail: rredl@freesurf.ch

problems, including a poorly defined switching frequency (in some variations of ripple-based control the frequency strongly depends on the parasitics of the output capacitor, circuit delays, and the input and output voltages), jittery behavior (this is caused by the ambient electromagnetic noise, which is generated for example by another converter on the same circuit board), inadequate dc regulation, and tendency for fast-scale, or subharmonic, instability.

The frequency dependency can be solved by introducing a timer for adaptively setting the on-time or off-time of the control switch, by feedforward frequency stabilization [3,4], or by synchronizing the converter to an external clock [3–5,13]. The jitter caused by the ambient noise can be reduced by a switched noise filter [6]. The dc regulation can be improved by compensating some of the error components [7] or by introducing an integrating error amplifier between the reference voltage and the PWM comparator (“Vsquare” control, [8, 9]). Unfortunately, the Vsquare control is a patented technology, which prevents its general use; also it does not solve the noise sensitivity issue because the PWM ramp remains small. Reference [10] discloses a hysteretic regulator that combines the error amplifier with a large artificial ramp, but that solution is complex and also proprietary.

This paper presents an enhancement for the basic ripple regulators that improves the dc regulation, reduces the noise sensitivity, and can also alleviate the fast-scale instability. The enhancement can be used with all types of ripple regulator architectures, including the hysteretic, constant-on-time and constant-off-time versions, and it can also be adapted to the constant-frequency versions. By further improving the constant-on-time version with the addition of the proprietary ramp pulse modulation technique [11] discussed in Section 4, the unloading overshoot can be significantly reduced.

## 2 Ripple Regulator Overview

Figure 1 shows the general block diagram of the ripple regulator.

The output voltage of the LC power filter is connected to the inverting input of the PWM comparator either directly or with a feedback divider. Optionally the



**Fig. 1** Ripple regulator block diagram

feedback divider includes a signal filter (e.g., a capacitor across one of the feedback resistors). The output of the PWM comparator changes state when the feedback voltage  $v_{fb}$  crosses  $V_{ref}$ . The comparator output goes through a postprocessor circuit that sets one or more parameters (on-time, off-time, frequency, peak-to-peak ripple, etc.) of the converter. In some implementations (e.g., the hysteretic regulator [1] or ripple regulators using the switched noise filter [6]) the feedback voltage or the reference voltage is modified by a signal coming back from the postprocessor—this is represented by the dashed-line connection.

## 2.1 Hysteretic Regulator

Figure 2 shows a buck converter, controlled by a hysteretic comparator. This circuit is the first, and possibly simplest, member of the ripple regulator family; its origin goes back to the 1950s. Its common name is “hysteretic regulator.”

The basic operation of the hysteretic regulator is as follows. When the switch S is on, the inductor current  $i_L(t)$  increases. This leads to an increase in the output voltage  $v_{out}(t)$  and also to a proportional increase in the feedback voltage  $v_{fb}(t)$ . When  $v_{fb}(t)$  exceeds the upper threshold ( $V_{ref} + V_H/2$ ) of the comparator, the output of the comparator goes low, which causes the turn-off of switch S after a turn-off delay  $T_{d(off)}$ . Then  $i_L(t)$  begins to decrease, causing a decrease in both  $v_{out}(t)$  and  $v_{fb}(t)$ . Eventually  $v_{fb}(t)$  drops below the lower threshold ( $V_{ref} - V_H/2$ ) of the comparator, which changes the output of the comparator to low and causes the turn-on of switch S after a turn-on delay  $T_{d(on)}$ .

A practically important characteristic of this regulator is the switching frequency, which strongly depends on the parasitics of the output capacitor and the turn-on and turn-off delays [12]. The capacitor parasitics are the equivalent series resistance (“ESR,”  $R_C$  in Fig. 2) and the equivalent series inductance (“ESL,”  $L_C$  in Fig. 2). Besides the capacitor parasitics and the delays, the frequency depends also on the hysteresis voltage  $V_H$ , the division ratio of the feedback divider, the inductance L, and the input and output voltages. The capacitance C, however, has only a



**Fig. 2** Hysteretic regulator

small effect on the frequency. It is to be noted that in continuous conduction mode (“CCM”) the switching frequency is a weak function of the load current, but in discontinuous conduction mode (“DCM”) the switching frequency is essentially proportional to the load current, which is useful for maintaining high efficiency at light load.

The dc regulation depends on the time constant ( $R_C C$ ) of the output capacitor and on the turn-on and turn-off delays. In DCM, the dc output voltage is also a function of the load current [12].

In CCM, neglecting the effect of the capacitance, the switching frequency is

$$f = \frac{R_C}{V_{H(\text{eff})} L} \frac{(V_{\text{in}} - V_{\text{out}}) V_{\text{out}}}{V_{\text{in}}} \quad (1)$$

where

$$V_{H(\text{eff})} = V_H \left( 1 + \frac{R_1}{R_2} \right) - \frac{L_C}{L} V_{\text{in}} + T_{d(\text{off})} \frac{V_{\text{in}} - V_{\text{out}}}{L} R_C + T_{d(\text{on})} \frac{V_{\text{out}}}{L} R_C \quad (2)$$

In DCM, the switching frequency is

$$f = \frac{1}{T_{\text{on}} + T_{\text{off1}} + T_{\text{off2}} + T_{d(\text{on})}} \quad (3)$$

where

$$T_{\text{on}} = \frac{L}{R_C} \frac{V_H \left( 1 + R_1/R_2 \right) - (V_{\text{in}} - V_{\text{out}}) L_C / L}{V_{\text{in}} - V_{\text{out}}} + T_{d(\text{off})} \quad (4)$$

$$T_{\text{off1}} = \frac{I_{\text{peak}} L}{V_{\text{out}}} \quad (5)$$

$$T_{\text{off2}} = \frac{\Delta V_C}{I_{\text{out}}} C \quad (6)$$

$$I_{\text{peak}} = \frac{T_{\text{on}}}{L} (V_{\text{in}} - V_{\text{out}}) \quad (7)$$

$$\Delta V_C = \left( \frac{I_{\text{peak}}}{2} - I_{\text{out}} \right) \frac{T_{\text{on}} + T_{\text{off1}}}{C} \quad (8)$$

The hysteretic regulator is essentially free from fast-scale instability. Its biggest practical limitation is the poorly defined switching frequency in CCM.

## 2.2 Constant-Off-Time and Constant-On-Time Ripple Regulators

Figure 3 shows two ripple regulator architectures where the frequency is independent from the capacitor parasitics or the feedback divider. In CCM the frequency is also independent from the inductance of the output inductor.

A well-defined switching frequency is achieved by inserting a timer, or monostable multivibrator (MMV), between the PWM comparator and the switch S. In the constant-off-time peak-voltage regulator (PVR) when the feedback voltage  $v_{fb}(t)$  rises above  $V_{ref}$  the timer is triggered and turns off switch S for a set time  $T_{off(set)}$ . The switch turns back on when  $T_{off(set)}$  expires. In the constant-on-time valley-voltage regulator (VVR) when the feedback voltage  $v_{fb}(t)$  falls below  $V_{ref}$  the timer is triggered and turns switch S on for a set time  $T_{on(set)}$ . The switch is then turned off when  $T_{on(set)}$  expires. Table 1 shows the switching frequencies of these regulators in CCM and DCM.



**Fig. 3** Constant-off-time peak-voltage regulator (PVR) and constant-on-time valley-voltage regulator (VVR)

**Table 1** Switching frequencies of the constant- $T_{off}$  PVR and constant- $T_{on}$  VVR

| Constant-off-time PVR |                                                                                                                                                                | Constant-on-time VVR                                                |
|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|
| CCM                   | $f = \frac{V_{in} - V_{out}}{V_{in}T_{off}}$                                                                                                                   | $f = \frac{V_{out}}{V_{in}T_{on}}$                                  |
| DCM                   | where<br>$f = \frac{1}{T_{off} + T_{on(dcm)}}$                                                                                                                 | $f = \frac{2L I_{out} V_{out}}{T_{on}^2 (V_{in} - V_{out}) V_{in}}$ |
|                       | $T_{on(dcm)} = \frac{L V_{out} I_{out} \left( 1 + \sqrt{1 + \frac{2(V_{in} - V_{out}) V_{in} T_{off}}{L V_{out} I_{out}}} \right)}{(V_{in} - V_{out}) V_{in}}$ |                                                                     |

In the table  $T_{off} = T_{off(set)} + T_{d(on)} - T_{d(off)}$  and  $T_{on} = T_{on(set)} - T_{d(on)} + T_{d(off)}$ .



**Fig. 4** Implementing frequency stabilization of the constant-\$T\_{on}\$ VVR in CCM

By inspection of the frequency expressions of the constant-on-time VVR in Table 1 we can conclude that (1) in CCM the frequency can be stabilized by making \$T\_{on}\$ inversely proportional to \$V\_{in}\$ and (2) in DCM the frequency is proportional to the output current. Both of those features are quite useful in practice, which explains the popularity of the constant-on-time VVR. Figure 4 shows how the frequency stabilization in CCM is implemented.

The idea is to make the charging current of the capacitor \$C\_t\$ in the timer section proportional to the input voltage. The results are an on-time

$$T_{on(set)} = \frac{C_t V_{th}}{g V_{in}} \quad (9)$$

and a switching frequency (neglecting the delay time difference)

$$f = \frac{V_{out}}{V_{in} T_{on(set)}} = \frac{g V_{out}}{C_t V_{th}} \quad (10)$$

As can be seen the frequency is now independent from \$V\_{in}\$.

It is to be mentioned that the constant-\$T\_{off}\$ and constant-\$T\_{on}\$ regulators produce fast-scale instability when the \$R\_C C\$ time constant of the output capacitor is less than one-half of the set off-time or on-time [12]. The addition of a second, low-time-constant, capacitor to the output aggravates this problem.

### 2.3 Constant-Frequency Ripple Regulators

The frequency of the ripple regulators can also be directly synchronized to an external clock. Figure 5 shows how to do this. (Note: [13] presents a similar solution, but without the stabilizing ramp.)



**Fig. 5** Constant-frequency PVR and VVR

In the PVR the clock pulse turns on the switch and in the VVR the clock pulse turns off the switch  $S$ . The switch changes state when the feedback voltage goes above (PVR) or below (VVR) the reference voltage  $V_{ref}$ . To avoid fast-scale instability above (PVR) or below (VVR) 50% duty ratio, a small ramp voltage  $v_{ramp}(t)$  must be added to  $v_{fb}(t)$  (PVR) or to  $V_{ref}$  (VVR); also the time constant of the output capacitor must be above a certain value. Reference [12] discusses how select the capacitor time constant and the ramp amplitude to ensure stable operation.

## 2.4 Considerations for Dc Regulation

Besides the obvious factors that influence the initial accuracy, the dc regulation of the previously discussed ripple regulators depends mostly on the actual shape and peak-to-peak value of the output ripple voltage and on the turn-off or turn-on delays. In the case of the constant-frequency regulators the presence of the stabilizing ramp also contributes to the dc regulation error.

It is possible to determine mathematically how the various parameters of the regulator influence the dc output voltage, but the derivations can be rather tedious. In any case, it is reasonable to assume that the total variation of the dc output voltage is not more than the peak-to-peak ripple, except when the turn-on and/or turn/off delays are excessively big or when stabilizing ramp is employed.

For applications with a high dc accuracy requirement (e.g., voltage regulators feeding microprocessors or DSPs) the basic ripple regulators might not have an acceptably tight dc regulation. In those cases the factors that influence the dc output

voltage can be compensated, e.g., by using an approach similar to the one discussed in [7], or by adding a high-dc-gain error amplifier to the converter, as discussed below.

## 2.5 Vsquare Ripple Regulator

Figure 6 shows how an error amplifier with high dc gain can be used to improve the dc regulation of the basic ripple regulators [8]. That amplifier is added between the reference voltage and the corresponding input of the PWM comparator. It monitors the difference between the second feedback voltage  $v_{fb2}(t)$  and  $V_{ref}$  and provides an error voltage  $v_{err}(t)$  for the non-inverting input of the PWM comparator. The error voltage varies slightly as needed to maintain perfect dc regulation, while the fundamental operation of the parent ripple regulator remains virtually unchanged. This technique, called Vsquare regulation, is effective and easy to implement—unfortunately it is patented [9], and therefore it cannot be used without a license. Also, Vsquare does not help with some of the other deficiencies of the ripple regulator (small PWM ramp and the related noise sensitivity or fast-scale instability).



Fig. 6 Vsquare ripple regulator

## 3 Enhanced Ripple Regulators with Combined Voltage-Error and Ripple Amplifier

Figure 7 shows an alternative solution to the dc regulation problem of the basic ripple regulators—the enhanced ripple regulator.

The main difference between the Vsquare regulator and the enhanced ripple regulator of Fig. 7 is that here there is only one feedback path, via the amplifier. That amplifier is actually used for two purposes—it serves as a high-dc-gain voltage-error amplifier for a nearly ideal dc regulation and it also amplifies and shapes the ripple voltage that constitutes the ramp for the PWM comparator.



**Fig. 7** Enhanced ripple regulator with combined voltage-error and ripple amplifier

The PWM implementation can be any of the previously discussed ripple-regulator variations (hysteretic, constant- $T_{off}$  and constant-frequency PVR, constant- $T_{on}$  and constant-frequency VVR). In the schematic of Fig. 7 the voltage  $V_1$  serves only as a bias voltage to position the output of the amplifier to within a convenient range.

### 3.1 Amplifier and Loop Compensation Considerations

The compensation of the amplifier should be selected such that the correct switching frequency (in the hysteretic version) is ensured, or the fast-scale instability (in the constant- $T_{on}$ , constant- $T_{off}$ , or constant-frequency versions) is eliminated. The frequency response of the amplifier is as follows.

$$A(s) = -\frac{1}{s(C_A + C_B)R_1} \frac{(1 + sR_1C_1)(1 + sR_AC_A)}{1 + sR_A \frac{C_AC_B}{C_A + C_B}} \quad (11)$$

By properly positioning the poles and zeros the effect of the parasitic inductance (ESL) on the shape of the amplified ripple can be eliminated. This is important for hysteretic regulators that are designed to operate in a given frequency range. The compensation can also reduce the destabilizing effect of the smaller than required  $R_C C$  time constant of the output capacitor or of a low-time-constant capacitor in parallel with the bulk output capacitor. For example, by omitting  $C_1$  and adding the capacitor  $C_B$  such that the second pole of the amplifier coincides with the ESL zero ( $R_C/2\pi L_C$ ) in the impedance of the capacitor the amplified ripple will be free of the effect of the inductive ripple component. This will eliminate the effect of the capacitor ESL on the switching frequency. By using  $C_1$  in parallel with  $R_1$  such that the first zero in the frequency response is less than the ESR zero of the output

capacitor ( $1/2\pi R_C C$ ), the constant-off-time or constant-on-time regulators will be free of the fast-scale instability even with  $R_C C$  time constants that are less than one-half of the set off-time or on-time.

The  $R_A C_A$  time constant should be selected such that the corresponding zero frequency is around or below the switching frequency, so that it does not exert much influence on the shape of the amplified ripple. It is to be noted that the regulator works well with zero frequencies much below the switching frequency but this slows down the dynamic correction of deviations caused by sudden input-voltage or load-current steps.

If  $C_A$  is selected such that its impedance is much less at the switching frequency than that of  $R_A$ , and if  $C_1$  and  $C_B$  are not present, the amplification of the ripple component will be determined solely by the  $R_A/R_1$  ratio. There is no hard rule regarding how to select that ratio. If the ratio is large the amplifier must have a high open-loop bandwidth, but the converter will provide tighter dynamic regulation, and if the ratio is small the required open loop bandwidth will be reduced but the dynamic regulation will be poorer. A ratio of about 10 to 50 seems satisfactory in most practical cases. Note also that if we use RPM (discussed in the next section), a higher  $R_A/R_1$  ratio tends to reduce the unloading overshoot more effectively.

## 4 Ramp Pulse Modulation (RPM)

Ramp pulse modulation, or RPM, is a proprietary improvement for the enhanced ripple regulator with constant-on-time control [11].

Ripple regulators with constant-on-time control are popular in battery-powered applications because in DCM the switching frequency becomes proportional to the load current, which ensures high efficiency at light loads. Another reason for their popularity is that by using input-voltage feedforward in CCM the frequency can be made quasi-constant, which is important for better-controlled switching losses or a predictable EMI. A known drawback of constant- $T_{on}$  control is, however, that when the load current steps down during the on-time the controller cannot terminate the conduction of the control switch until  $T_{on}$  expires. This increases the overshoot of the output voltage. The presence of the voltage-error/ripple amplifier in the enhanced version does not overcome this limitation, but RPM does. RPM retains the advantages of the enhanced ripple regulator with constant- $T_{on}$  control but by making  $T_{on}$  adaptively shorter during a transient by using  $v_{err}(t)$  as the threshold of the timing ramp it also reduces the overshoot. Figure 8 shows the implementation of RPM. Figure 9 shows the simulated waveforms of the buck converter with and without RPM control when a worst-case unloading transient happens. (Simulation parameters:  $V_{in} = 5$  V,  $V_{ref} = 1.2$  V,  $I_{out} = 15$  A  $\rightarrow$  5 A,  $L = 500$  nH,  $C = 500$   $\mu$ F,  $L_C = 0$ ,  $R_C = 5$  m $\Omega$ ,  $R_1 = 1$  k $\Omega$ , no  $R_2$ ,  $R_A = 50$  k $\Omega$ ,  $C_A = 200$  pF,  $C_B = 0$ ,  $C_t = 100$  pF,  $g = 100$   $\mu$ A/V,  $V_1 = 4$  V; for the case w/o RPM the threshold voltage for the comparator of the on-time setting section is 3 V). As can be seen,



**Fig. 8** Ramp pulse modulation in the enhanced ripple regulator with constant-on-time control



**Fig. 9** Worst-case unloading transient responses of the enhanced ripple regulator with or without RPM (see details in the text)

in this particular example the presence of RPM reduced the unloading overshoot by about 34 mV, which is 2.8% of the output voltage.

#### **4.1 Experimental Results**

RPM control has been implemented in several integrated circuits, including the ADP3209, a multimode PWM controller with diode emulation and voltage positioning capabilities [14]. Figure 10 shows the block schematic of that device.

Figure 11 shows steady-state experimental waveforms taken from a buck converter controlled by the ADP3209 IC. (Note that the actual ramp voltage is internal to the IC and cannot be observed from outside. The ramp voltage waveform in



**Fig. 10** Block schematic of the ADP3209



**Fig. 11** Steady-state waveforms of a buck converter controlled by the ADP3209



**Fig. 12** Waveforms of the buck converter in CCM (*left*) and in DCM (*right*)

Fig. 11 was manually added to the scope photo.) Figure 12 shows the error voltage, the switched-node voltage and the output ripple voltage of the converter in CCM (5-A load, 355-kHz switching frequency, left side) and in DCM (1-A load, 155-kHz switching frequency, right side). As can be seen, the inherent proportionality of the switching frequency and load current in DCM, which is a fundamental property of the constant-on-time ripple regulator, is retained with the RPM control.

## 5 Conclusions

After providing a brief review of the traditional (hysteretic, constant-on-time, constant-off-time, or constant frequency) ripple regulators this paper introduced a novel ripple regulator architecture that improves the dc regulation, reduces noise sensitivity, and also helps avoiding the fast-scale instability. The idea is to add a dual-purpose (error and ripple) amplifier in the feedback path. Further performance improvement could be achieved by the inclusion of the so-called ramp pulse modulation technique, which substantially reduces the worst-case unloading overshoot of the constant-on-time ripple regulator. These techniques have been successfully implemented in commercially available integrated circuits.

## References

1. G. Wester, "Describing-function analysis of a ripple regulator with slew-rate limits and time delays," PESC 1990 Record, pp. 341–346.
2. J. Sun, "Characterization and performance comparison of ripple-based control methods for voltage regulator modules," PESC 2004 Record, pp. 3713–3720.
3. R. Redl and N. O. Sokal, "Frequency stabilization and synchronization of free-running current-mode-controlled converters," PESC 1986 Record, pp. 519–530.

4. T. Szepesi, "Stabilizing the frequency of hysteretic current mode dc/dc converters," PESC 1986 Record, pp. 550–559.
5. C.-H. Tso and J.-C. Wu, "A ripple control buck regulator with fixed output frequency," IEEE Power Electronics Letters, vol. 1, no. 3, September 2003, 61–63.
6. R. Redl and G. Reizik, "Switched noise filter for the buck converter using the output ripple as the PWM ramp," Proceedings of APEC 2005, pp. 918–924.
7. C. Song and J. L. Nilles, "Multiple-phase high-accuracy hysteretic current-mode voltage regulator for powering microprocessors," Proceedings of APEC 2008, pp. 517–522.
8. D. Goder and W. R. Pelletier, "V2 architecture provides ultra-fast transient response in switch mode power supplies," HFPC Power Conversion – September 1996 Proceedings, pp. 19–23.
9. D. Goder, "Switching regulator," U.S. Patent 5,770,940.
10. M. M. Walters, et al, "Synthetic ripple regulator," U.S. Patent 6,791,306.
11. T. F. Schiff, "Switching power supply control," U.S. Patent Application 20060055385.
12. R. Redl, "Ripple regulator review," Professional Education Seminar S2, APEC 2008.
13. K. Lee, K. Yao, X. Zhang, Y. Qiu and F. C. Lee, "A novel control method for multiphase voltage regulators," Proceedings of APEC 2003.
14. APD3209 datasheet from On Semiconductor.

# Robust DCDC Converter for Automotive Applications

Ivan Koudar

**Abstract** A Step up step down non-isolated DCDC converter using Average Current Mode (ACM) control is described. Specific automotive requirements and the corresponding DCDC topology adaptations are discussed. Comparison of several topologies is discussed and motivation for choosing ACM is explained. Topology of the whole converter is shown with a link to the DCDC die floor plan. Specific sensitive blocks are elaborated in more detail such as the accurate coil current sensing, where the schematics of the common structures for all three modes (boost, buck-boost, and buck) are described. The block layout is also depicted. Measured results on manufactured silicon verify the functionality and the high DCDC robustness for the automotive field. EMC results, which fulfil required automotive standards, are included.

## 1 Power Management – Automotive Specific Requirements

Effective power management in modern car electronics systems is a necessity, especially as increasing numbers of sensitive electronic systems require a stable low voltage supply source.

### 1.1 General Automotive Requirements

Car battery voltage frequently requires conversion to an appropriate level required by that particular electronic device. Car battery line voltage can easily swing very fast between 4 V and 40 V during engine starting, while all the sensitive electronics (sensors, diagnostics, etc) and the overall car communication networks (CAN, LIN, etc) must stay operational. On the contrary, a very slow battery voltage ramp of a

---

I. Koudar (✉)  
AMI Semiconductor, Czech Republic  
e-mail:Ivan.koudar@amis.com

few volts is typical for a battery charging process where all electronics systems must also stay operational.

Gateway is a typical example of an SOC, where the DCDC converter is integrated. An important characteristic of the Gateway SOC is the integration of a high number of CAN and LIN communication channels. These channels introduce very fast load current changes, which must be well absorbed by the DCDC converter without disturbing sensitive electronics. In addition, high conversion efficiency and overall robustness is required. The automotive environment has a very demanding temperature range. Typical required range is  $-45^{\circ}\text{C}$  up to  $150^{\circ}\text{C}$  ambient and some special modules (mounted on the engine body) even require a temperature range approaching  $200^{\circ}\text{C}$ .

## 1.2 DCDC Converter Efficiency

Converter efficiency is the next important feature. One can say that car electronics power consumption and DCDC efficiency is not that dominant in comparison to e.g. headlamps or other car power actuators efficiency. The reality is more complex. The DCDC integration on an SOC silicon requires a very high efficiency, because the high ambient temperature together with package thermal resistance ( $R_t$ ) defines maximum allowable power losses. This high efficiency requirement is further increased by the fact that car industry price pressures require low cost packages suffering from poor  $R_t$  parameters (30 – 60 K/W)

## 1.3 EMC Radiation and Susceptibility

Automotive EMC requirements are very strict. Not only international standards apply but many car manufacturer specific limits are also applicable. A DCDC converter as a switching mode power supply is very likely to generate wide RF radiated and conducted emissions. Typical radiated emission results with visible DCDC contribution is depicted in Fig. 1.



**Fig. 1** Example of DCDC conducted emissions measured on input battery node (*left*) and on the DCDC output voltage (*right*)

An important aspect is the PWM frequency. A fixed PWM frequency is preferred in automotive applications. The main reason that fixed PWM frequency is preferred is the predictable EMC behaviour as opposed to the hysteretic DCDC converter topologies, which have very poor EMC performance.

## 1.4 Transient Over Voltage Pulses

A set of special test pulses is standardized for automotive electronic modules. The purpose is to emulate fast power pulses, which are coupled through parasitic capacitances or inductances to a module, and all unwanted ESD stresses during manufacturing and maintenance. A typical over voltage pulse occurs due to a sudden switching of current, e.g. the starting or stopping of an electrically controlled window. An example of over voltage pulses applied on a power supply line is depicted in Fig. 2. A full set of pulses can be found in DIN40839 parts 1&3 and ISO7637 parts 1&3. Electronic module qualification strictly requires the module to survive all prescribed pulse tests and even some critical modules must stay fully operational without any parameter deviation during test pulses. Chip level ESD requirements start at 2 kV HBM (Human Body Model), and even 8 kV HBM are more and more required by OEM car manufactures. Typical ESD levels for MM (Machine Model) and CDM (Charge Device Model) are 200 V and 750 V respectively.



| Pulse type | Peak voltage (Vpulse) | Cycles |
|------------|-----------------------|--------|
| 1          | -100V                 | 10     |
| 2          | +100V                 | 10     |
| 3a         | -150V                 | 1h     |
| 3b         | +100V                 |        |
| 5b         | +40V                  | 5      |

Fig. 2 Example of the over voltage pulses applied on power supply lines

## 2 DCDC Converter Topologies for Automotive Applications

There are about 14 basic topologies [1] commonly used to implement DCDC converters. Each topology has its unique properties and is best suitable for a particular application [1, 2]. Only a limited set of DCDC topologies can fulfil the strict

automotive requirements. The most frequently used topologies are voltage mode control, peak current mode control, and average current mode control respectively.

## 2.1 Voltage Mode (VM)

Voltage mode control is based on a single regulation loop. This topology is relative simple, but it has some severe limitations. The main limitation of the VM control is in the limited response speed on a fast battery voltage change.

The regulation loop frequency band-width (BW) is restricted to  $BW < 1 \frac{1}{2\pi LC}$ . This constraint leads to some output voltage dips during fast battery voltage fluctuations. A typical non-isolated voltage mode DCDC topology is displayed in Fig. 3.



**Fig. 3** Non isolated voltage mode DCDC converter topology

## 2.2 Peak and Average Current Mode (PCM, ACM)

PCM and ACM control topologies offer much better rejection of fast battery voltage transient and implicit current limitation possibilities. The drawback is a more complex structure based on two nested regulation loops. Examples of the ACM and PCM topologies are displayed in Fig. 4a and 4b respectively. Both PCM and ACM topologies have two regulation loops – inner and outer. The slow outer loop is commonly called the “voltage loop” and the fast inner loop is often called the “current loop”. The voltage loop functionality is very close to the voltage loop



**Fig. 4** Non-isolated DCDC current mode topologies. ACM topology (a), PCM topology (b)



Fig. 5 ACM vs. PCM noise immunity comparison

in Fig. 3. Here the voltage loop monitors the output voltage level. The VA amplifier output  $V_E$  error signal directly controls the pulse width modulator. In the case of the current mode control (Fig. 4a,b), the voltage amplifier output “programs” the coil current  $I_L$  via the current loop. The  $I_L$  is “programmed” in a way that the output voltage  $V_O$  approaches required level. The current loop’s primary function is fast coil current regulation.  $I_L$  is sensed and converted to the  $V_I$  voltage. Comparing  $V_I$  and  $V_E$  dictates the PWM duty cycle and this  $I_L$ . Current loop regulates  $I_L$  towards average or peak values for the ACM and PCM controls respectively.

The frequency BW of the current loop is at least an order of magnitude higher than the voltage loop BW. Such high current loop speed significantly improves the rejection of fast battery voltage fluctuations in comparison to the voltage mode control. A relevant question is: which current mode topology performs better in a harsh automotive environment? The answer is not black and white.

Both topologies have their own advantages and disadvantages. One major advantage of the ACM is better noise immunity.

In Fig. 5 below is a more detail description of this phenomenon. The core of the “problem” is in the SNR on the  $V_I$  voltage.

For small coil currents (i.e., light loads) the SNR dramatically declines, while the noise remains high. The PCM doesn’t (even cannot) have any kind of  $V_I$  filtering. This results in higher PWM jitter. Higher PWM jitter impacts the output voltage noise and EMC radiated and conducted emissions. ACM benefits from the current amplifier presence with a low pass filter LPF. On one side the LPF frequency profile defines the average current proximity level, on the other it impacts the inner loop speed. Typically the LPF GBW is  $\sim 1/2$  of the PWM frequency.

A disadvantage of the ACM compared to the PCM is a higher regulation loop complexity and a little bit worse fast battery fluctuations rejection. Also current limitation is better controlled for the PCM topology.

### **3 Practical Robust Automotive Average Current Mode Control DCDC Converter Integration**

This paragraph describes a non-isolated ACM DCDC converter integrated in the Gateway SOC. The reason for preferring this topology was mainly the robust noise immunity as discussed in paragraph 2. This converter is a step up/step down type implementing all three - boost, buck-boost and buck. Sets of the basic DCDC parameters are listed in the table below.

The discussed converter detail block level structure is displayed in Fig. 6 and is formed from these blocks: CSA (Current Sensing Amplifier), CA (Current Amplifier), VA (Voltage amplifier) and PWM modulator. This set of blocks is the core of the regulation loop. The Mode control block selects the proper boost, buck-boost or buck mode according to the  $V_{in}$  (battery) voltage.

## Table of basic DCDC converter parameters

|             | Parameter                                 | min     | max     |
|-------------|-------------------------------------------|---------|---------|
| $V_{in}$    | Input line voltage                        | 3.3 V   | 30 V    |
| $I_{load}$  | Load current                              | 30 mA   | 650 mA  |
| $V_O$       | Output voltage                            | 5.54 V  | 6.77 V  |
| $I_{lim}$   | Current limitation                        | 1.8 A   | 2.2 A   |
| $f_s$       | PWM frequency                             | 324 kHz | 396 kHz |
| $T_{set}$   | $V_O$ settling time @ max $I_{load}$ step |         | 100 us  |
| Trdy        | Start-up time                             |         | 5 ms    |
| DPWM        | PWM Duty cycle                            | 8.3%    | 91.6%   |
| $V_{start}$ | $V_{in}$ startup voltage                  | 4.75 V  |         |
| $T_j$       | Junction temperature                      | -45degC | 150degC |

The charge pump is an auxiliary block supporting the power switch controls. The startup block together with soft start block controls smooth DCDC start under any actual  $V_{in}$  voltage.



**Fig. 6** Detail non-isolated DCDC converter implementing ACM topology

### **3.1 DCDC Layout Floor Plan**

Floor plan of the silicon with integrated DCDC converter is displayed in Fig. 7. The overall DCDC occupied area is dash marked and the solid line marks the main



Fig. 7 DCDC converter silicon floor plan

blocks and its critical connections. There are few severe layout and floor plan rules to be followed. Extra attention should be paid to the isolation of the power part and regulation loop. The regulation loop blocks are positioned far from power switches and surrounded by isolation guard rings. The most sensitive blocks like CSA, CA, and VA are close together with straight signal path direction. The current sensing resistor has a symmetrical layout and its connection to the CSA amplifier has to be perfectly balanced. Special care must be taken on bonding of the power pins. It is strongly recommended to merge power part bond pads with power switch metallization. Also due to electro migration phenomena, the number of bond pads depends on the current direction.

## 4 ACM Regulation Loop Concept

### 4.1 Current Loop Stability

As was mentioned in paragraph 2, ACM involves 2 loops – a voltage and a current loop. From feedback systems theory [1, 3], it is known that a system is stable if both isolated loops are stable. This is the base for the analysis and the design approach. The topology of the current loop and the voltage loop is displayed in Fig. 8.

Each loop must be analyzed separately using large and small signal models.

Finally both loops are put together to evaluate overall ACM control performance.

Current loop large signal model analysis details are displayed in Fig. 9

The main focus of the large model is to avoid parasitic comparator switching. To guarantee it, the following condition must be met:

$$\frac{\partial V_s(t)}{\partial t} > \frac{\partial V_E(t)}{\partial t} . \quad (1)$$

In the opposite case, rapid comparator switching causes sub-harmonics in the current coil with a negative impact on EMC radiated emissions and output voltage ripple. The final large signal stability condition is:

$$A_{CA} \leq V_s f_s \frac{L}{V_O R_s A_{CSA}} \quad (2)$$



**Fig. 8** Detail ACM control loops system



**Fig. 9** Current loop large signal model analysis

where  $V_s$  is the saw tooth amplitude,  $f_s$  is the PWM frequency,  $L$  is the coil inductance,  $V_O$  is the output voltage,  $R_s$  is the sensing resistor and  $A_{CSA}$  is the  $A_1$  amplifier gain. This is a very important result because equation (2) defines the maximum HF gain the current amplifier can have to meet the large signal stability criteria.

Figure 10 summarizes the large signal model requirement.



**Fig. 10** Large signal model current amplifier gain restriction

The small signal current loop stability is more complex and all operational modes (boost, buck-boost and buck) must be analyzed separately.

Small signal analysis of any operational mode requires an appropriate model of the power part. Such a model derivation is far beyond the scope of this paper. More details of the power part small signal model derivation can be found in [1] and [3]. The current loop small signal model of the buck mode using appropriate buck mode power part model is displayed in Fig. 11. The shadow area in the picture displays the modeled power part. Using a known elaborated linear circuit analysis method we get a set of equations describing the required frequency response for the current loop.

$$A_{CL}(f) = A_{CA0} \frac{\left(1 + \frac{jGBW_{ACL}}{f_Z}\right)}{\left(1 + \frac{jGBW_{ACL}}{f_p}\right)} \frac{V_{in}Z_T}{(R_S + j2\pi GBW_{ACL}L)V_s} \quad (3)$$

$$A_{CA}(f) = A_{CA0} \frac{1 + jf/f_Z}{1 + jf/f_p} \quad (4)$$

$$GBW_{ACL} = \frac{V_{in}f_s L - V_O R_S}{2\pi V_O L} \quad (5)$$



**Fig. 11** Example of buck mode small signal stability model

## 4.2 Voltage Loop Stability

Up until now, the voltage loop was assumed to be open for the current loop analysis. At this point the current loop is analyzed and its features are described by its large (2) and small signal (3), (4) and (5) models, so the voltage loop can be added. For DC signals the loop is closed, while for AC signals it is opened due to the AC stopper.



Fig. 12 Current loop buck mode frequency response example

The AC stimulus is forced via the AC source  $V_{AC}$ . The goal for the following analysis is to determine the voltage loop DC gain, the voltage loop unity gain frequency  $GBW_{VL}$ , and the required  $GBW_{VA}$  for the voltage amplifier. Similarly to current loop analysis, first the voltage loop large signal model is analyzed to derive required DC gain. The large signal model together with the required I – V characteristic is displayed in Fig. 14.

DCDC converter performance is mostly specified by the maximum and minimum load current and the corresponding voltage drop  $\Delta V_O$  (see Fig. 14). The minimal voltage amplifier LF gain is described by formula:



Fig. 13 Voltage loop AC model



**Fig. 14** Voltage loop large signal model and its I-V characteristic

$$A_{VA0} = R_s A_{CSA} \frac{\Delta I_L}{\Delta V_O} \quad (6)$$

where \$A\_{CSA}\$ is the LF current sensing amplifier gain.

For small signal derivation the closed current loop can be modelled as a voltage controlled current source (VCCS), whose gain is given by (7) for \$f < GBW\_{ACL}\$.

$$g_{mp} = \frac{1}{R_s A_{CSA}} \quad (7)$$

Figure 15 depicts this AC model. An important impact of the current loop on the voltage loop analysis is that it “hides” the \$L\$.

In other words, the voltage loop AC model doesn’t see the \$R\_L-L-C\_F\$ resonator, but only \$C\_F R\_L\$ load. This makes the stability conditions for the ACM easier to meet than for the VM.

According to Fig. 15, the voltage open loop gain formula is:



**Fig. 15** Voltage loop small signal model

$$A_{VL}(f) = A_{VA}(f) \frac{R_L}{R_s A_{CSA}} \frac{\left(1 + j \frac{f}{GBW_{ACL}}\right) \left(1 + j \frac{2\pi f}{R_{ESR} C_F}\right)}{1 + j 2\pi f C_F (R_L + R_{ESR})} \quad (8)$$

where  $A_{VA}(f)$  is the voltage loop amplifier gain (dashed line in Fig. 16). Its DC value is given by (6). The open voltage loop module and phase frequency response (8) is depicted in Fig. 16.



**Fig. 16** Open Voltage loop module and phase frequency response

### 4.3 ESR Stability Impact

As mentioned in paragraph 1, typical temperature ranges for automotive electronics is required from  $-45^{\circ}\text{C}$  up to  $150^{\circ}\text{C}$  ambient.

The low temperature limit is especially critical for DCDC converter stability. Low temperatures causes electrolyte freezing, resulting in a dramatic ESR raises with a consequence of an uncontrolled zero shift as shown in Fig. 17.

As is seen the loop stability criteria is met for  $\text{ESR} = 0.1 \Omega$ , for  $\text{ESR} = 1 \Omega$  the stability is poor, while for  $\text{ESR} = 10 \Omega$  the regulation will be unstable. For DCDC designs in automotive applications it is a must to select output electrolytic capacitors with the lowest possible ESR - specified as worst case at low temperature limits. Another option is to add in parallel a few ceramic capacitors to partially compensate for the ESR impact – as depicted in Fig. 15.



**Fig. 17** ACM open loop frequency response sensitivity on ESR

## 5 Current Sensing and Limitation

Almost all DCDC converter circuits need inductor current sensing for several internal functions [1,4], especially current mode DCDC converters which use coil current sensing as a part of their inner (current) loop feedback. For DCDC current mode converters, the accuracy of sensing the current directly impacts the overall converter performance. It is also important to provide over current protection because the surrounding electronics and the DCDC circuit itself need to be protected during the DCDC steady state operation (where the DCDC control loop is balanced) as well as during the DCDC start-up phase (when the control loop is unbalanced). For DCDC converters using only a single mode like buck mode (step down) or boost mode (step up), current sensing and current limitation is much easier to implement [5]. On the contrary, converters which can smoothly switch among all buck, buck-boost, and boost modes [2] require more complex current sensing and limitation circuits because of the inductor's high swing voltage and high frequency common mode voltage (CMM). The problem is that the voltage representing the coil current is at least orders of magnitude smaller than the CMM ripple. Here the sensing circuitry design must provide a high common mode rejection ratio (CMRR) at high frequencies, together with high voltage robustness. A good overview of existing current sensing circuits and techniques can be found in [2] and [5]. These circuits have well elaborated sensing methods but assume a single (up or down) DCDC mode. Frequently, they assume a known value of L without taking into account part tolerances, parasitic resistance, manufacturing tolerances or temperature dependencies (especially important in automotive applications). All this results in inaccurate sensing. Other methods use mirrored current through a power switch, which also has the drawback of marginal accuracy due to a high mirroring factor. Practical implementation of an accurate current sensing and limitation circuit for a step up/down ACM DCDC converter is shown in Fig. 18. Current sensing is based on the 250 mΩ-sensing resistor  $R_s$ . This resistor is located behind the coil, which significantly helps to filter input noise - typical for harsh automotive applications. All the sensing circuitry is



**Fig. 18** Embedded current sensing and limitation circuitry

implemented in low voltage CMOS ( $0.7 \mu m$ ) with a maximum voltage of  $5.5 \text{ V}$ . To scale down the  $R_s$  common mode voltage swing (between  $0 \text{ V}$  and  $V_{\text{out}} + V_{D2}$ ), the scaling resistors  $R_3$  and  $R_4$  are employed with a scaling factor of  $1/3$ . This avoids over voltage on the low voltage CSA amplifier input.

An important feature is the scaling symmetry based on the easily met condition  $R_s \ll R_3, R_4$  and matched resistors  $R_{3A}, R_{3B}$  and  $R_{4A}, R_{4B}$ . The scalar common node is connected to the AGND representing the analog ground. In this case CSA inputs  $A+$  and  $A-$  see the same impedance. This is another factor, which strongly improves noise immunity. The key block having the dominant impact on current sensing performance is the four input differential amplifier CSA. Together with  $R_1$ ,  $R_2$  feedback, it serves as a transimpedance amplifier converting the coil current to



**Fig. 19** Current sensing amplifier with associated signals

the VCS voltage. The CSA amplifier circuit structure is displayed in Fig. 19. The key feature is signal isolation between the differential ports A and B. Port A is directly exposed to the coil common mode voltage. For the buck-boost and boost modes, the coil common mode voltage (not the voltage on a coil itself) changes very fast from 0 to  $V_o + V_{diode}$ .

The common mode slew rate (SR) can easily exceed  $400 \text{ V}/\mu\text{s}$ . Port B serves as a feedback input with ideally zero common mode voltage ripple. Practically the CMRR is a finite number and with such a high SR, the CSA output voltage contains small residual glitches. It is very important to keep these glitches symmetrical (Fig. 19) to avoid potential rectification by the CA amplifier causing a DC shift inside the current loop resulting in a higher output voltage inaccuracy. This requirement is ensured by the CSA symmetrical structure.

Both inputs of port A ( $Q_{A1}, Q_{A2}$ ) and port B ( $Q_{B1}, Q_{B2}$ ) are perfectly symmetrical. The second amplifier ( $Q_3, Q_4$ ) is symmetrical as well, which significantly helps to achieve a high CMRR value resulting in excellent current loop performance. Real



**Fig. 20** Current sensing measured signals RSA, RSB 2 V/div; VCS 500 mV/div; I<sub>coil</sub> 1 A/div

measured signals on the current sensing circuitry are displayed in Fig. 20. In this case, the DCDC converter was running in the buck-boost mode with maximum load. It is obvious from Fig. 20 that the VCS signal is for all practical purposes not influenced by the high and fast CMM voltage swing.

### 5.1 Current Sensing Block Layout

Details of current sensing block layout are displayed in Fig. 21. Connection to external components is also included. An important layout requirement is the symmetrical placement and connection of the  $R_s$  sensing resistor, which is split on purpose into two parts connected in parallel.

The CSA input connection to the  $R_s$  must also be well balanced – like symmetrical transmission lines.

## 6 Operational Mode Control

As was mentioned above, the DCDC converter uses all three modes – boost, buck-boost and buck. Switching between modes is primarily based on the monitoring of the input line voltage  $V_{in}$ . The signal  $D_{PWM}$  is used for mode swap synchronization.



**Fig. 21** Current sensing layout detail with internal and external connection

A mode change always happens after the active T<sub>ON</sub> phase to avoid sudden asynchronous coil current changes.

The detail mode switching system is depicted in Fig. 22. Comparators continuously monitor V<sub>in</sub> and output signals CMP<sub>1,2</sub> are coded and synchronized via the D<sub>PWM</sub> signal in the digital Control Block. The mode change takes place on the falling edge of D<sub>PWM</sub>.



**Fig. 22** DCDC mode swapping

## 7 Power Switches Control

### 7.1 Charge Pump

Both (high side and low side) power switches are NDMOS transistors. The low side driver doesn't need an extra floating voltage for NDMOS control. In the case of the high side driver, a floating voltage source is a must. Figure 23 schematically depicts the low and high side drivers together with the charge pump. This charge pump serves as a floating voltage source supplying the high side driver, where  $C_1$  accumulates this floating voltage.

It is very important to balance the low and high side drivers' delay for the buck-boost mode. In the case of an unbalanced delay the efficiency declines together with the output current capability. For more details refer to [2] and [5]



Fig. 23 Power stage control charge pump

## 8 Auxiliary and Automotive Specific DCDC Blocks

### 8.1 Static and Dynamic Current Limitation

The static and dynamic current limitation circuit and its functionality are displayed in Fig. 24. Static current limitation takes place when the regulation is in balance. Dynamic current limitation is active during the DCDC start up phase when both regulation loops are unbalanced. In the start up phase, the output voltage is zero and output error voltage VERR is saturated at its maximum level. In the case that the VERR is directly connected to the CA amplifier, the coil current can rise to a high-uncontrolled value, because the PWM duty is at its maximum. In the worst case, the DCDC converter can be destroyed.

Similarly for the steady state condition, when the load current state exceeds its maximum, the coil current can rise to an uncontrolled value. To avoid this, a current limitation block is inserted between the VA and CA amplifiers. During the DCDC start up phase the dynamic current limitation takes place. It is based on VECLP slope activated at the moment of start up. When the VECLP is rising, the PWM duty cycle is under control so the coil current is also controlled in the same way.

This phase is graphically depicted in Fig. 24. After the VECLP has reached its maximal level  $V_{CLAMP}$ , the coil current is limited at the desired maximal  $I_{LLim}$  level. The comparison of coil current during the start up phase for active and inactive dynamic current limitation measured on the silicon is displayed in Fig. 25. Coil current ( $I_{coil}$ ) and output voltage ( $V_o$ ) waveforms at the DCDC start up moment are captured. (The coil current scale is 1 A/div.) A smooth coil current rising together with current limitation at 1.8 A (flat top part on the  $I_{coil}$  waveform) is achieved.

### 8.2 DCDC Startup

The start-up of the DCDC converter is a relatively complex event. Dynamic current limitation (see Fig. 24) controls the starting current from  $V_{in}$  to avoid high current peaks leading in the worst case to burning bond wires or on-chip metal tracks.

The same requirement is for the output current charging the  $C_F$  capacitor. On top of this, in a car environment the  $V_{in}$  voltage during DCDC start-up can swing anywhere in a range from 4.7 V to 30 V. The start-up process depicted in Fig. 26 has two main phases. The first one is to pre-charge the  $C_F$  capacitor to 4.7 V. This is done by the  $I_{PCHG}$  current in cooperation with the  $CMP_1$  comparator. Pre-charging is stopped with the SoftStartRdy rising edge. The following phase is a DCDC soft start phase. DCDC always starts in the buck-boost mode because of  $V_{in}$  uncertainty.



**Fig. 24** Static and dynamic current limitation circuit

After 1.5 ms, the appropriate mode based on the  $V_{in}$  level is selected. During the soft start phase dynamic current limitation is activated and is supported by the Soft Start block. This block controls dynamically the maximum output voltage level of the input signal  $V_{ERR}$ . The output signal  $V_{ERR\_CLP}$  (see Fig. 26) rises up during the initial transient. This provides dynamic current limitation of the start-up current. Afterwards, the  $V_{ERR\_CLP}$  follows the  $V_{ERR}$  up to the clamping limit. This provides a steady state current limitation. Maximum steady state average coil current is



Fig. 25 Coil current for active and inactive dynamic current limitation comparison



Fig. 26 DCDC startup process and current limitation

$$I_{LAVG\ max} = \frac{V_{ERR\_CLP}}{R_s A_{CSA}} \quad (9)$$

In our case,  $R_s = 0.25 \Omega$  and  $A_{CSA} = 6$ . This sets the steady state current limitation to 2 A, while during the  $V_{ERR\_CLP}$  slope (see Fig. 26), the current limitation follows the same dynamic path. This provides a controlled and smooth  $I_L$  transient. Signal  $V_O RDY$  is activated at the moment  $V_O$  is settled inside a specified interval. Automotive OEM manufactures very often requires a test where the battery voltage very slowly (1 V/minute) raises up while the DCDC converter must start properly. Such a situation is displayed in Fig. 27.

Slow battery voltage ramp up emulates the state where the car discharged battery is slowly charged up while electronic systems must stay operational. The critical moment – marked as dashed in Fig. 27 surrounds the DCDC start-up moment. For reliable start-up voltage  $V_{in}$  should not drop down below the  $V_{in\_min}$  level – see Fig. 27. The EMC filter as displayed in Fig. 27 is a must for automotive electronic module with a DCDC converter. This filter together with the reverse polarity diode causes a voltage drop between the  $V_{BAT}$  and  $V_{in}$  nodes. Due to this, proper dynamic current limitation is crucial for reliable start-up. Uncontrolled start-up current leads to at least a relaxation oscillations. The next critical components are the electrolytic capacitors in the EMC filter, esp. ESR significantly impacts the  $V_{in}$  voltage dips.



**Fig. 27** DCDC startup for slow battery voltage ramp up test

## 9 Measured Results and Standard Tests

The basic set of the DCDC specification parameters are listed below.

### 9.1 I-V Characteristics

The DCDC I-V characteristics for each particular mode are displayed in the figures below. In the actual application, a coil with saturation current of 1 A is used. The impact is clearly seen in Figs. 28 and 29. In the moment the coil current exceeds 1 A,  $V_O$  abruptly declines. From Fig. 29 it is obvious that in the buck mode the coil current is always below 1 A if the load current is below 1 A as well. In this case,  $V_O$  is flat with respect to the output current while the load current easily reaches 1 A. Using a coil with higher saturation current, the overall DCDC load performance improves accordingly.

### 9.2 Regulation Dynamic – Load Step

Regulation loop dynamic performance was measured with fast load current step from minimal load current (30 mA) to maximal level 0.65 A for boost and buck boost modes. Dynamic response for Buck is faster by nature and therefore is not presented. Output voltage and PWM modulation is captured in Fig. 30. The bottom trace is the output voltage, the top trace is the low side switch PWM control signal and the middle trace is the high side switch PWM control signal. It can be seen that the load current step transient for boost and Buck-Boost mode lasts 81 us and 56 us respectively. Also the PWM modulation displays no ringing or instabilities. This proves correct current and voltage loop concept and design.

Boost mode:  $t_{set} = 80.6 \text{ us}$   $\Delta V = 357 \text{ mV}$  BuBo mode:  $t_{set} = 55.7 \text{ us}$   $\Delta V = 380 \text{ mV}$

### 9.3 Output Short Test

The next most important test required for automotive field is a periodical short and release on the DCDC output voltage. Such captured waveforms are displayed in Fig. 31. Reliable start-up, no uncontrolled current spikes nor thermal shut down are a must for automotive application.

### 9.4 EMC – Susceptibility and Radiation Results

An example of DPI (Direct Power Injection) EMC susceptibility results is displayed in Fig. 32. In this case the HF power was directly coupled to the  $V_{in}$  pin – see Fig. 27. Maximal output voltage deviation is < 0.85% in the frequency range of 1 MHz up to



Fig. 28 Boost and Buck-Boost I-V characteristic



Fig. 29 Buck mode I-V characteristic and measured efficiency



**Fig. 30** Boost and Buck-Boost mode regulation response in maximal load current step

1 GHz. A big portion of this robust immunity is due to the balanced current sensing topology – see Figs. 18 and 19.

An example of conducted emission measured on the battery line (VBAT) and output voltage (VH) node is displayed in Fig. 33. Also BB (Broad Band) and NB (Narrow Band) limits according IEC61967 are attached. The DCDC contribution is clearly seen – periodically repeating 360 kHz component, but still below limit.



**Fig. 31** DCDC output short – release test



Fig. 32 4 W DPI EMC susceptibility directly coupled to Vin pin



Emission measurement by 150Ohm method according to IEC 61967-4  
No cabling on VBAT. C110, C111 add. C=150pF add between VBAT and GND.



Emission measurement on VH by 150Ohm method according to IEC 61967-4.  
No cabling on VBAT. DCDC is ON.



Fig. 33 Conducted emission on battery line (VBAT) and DCDC output voltage (VH)

## 10 Application Notes

### 10.1 Ground System

An application diagram with highlighted system of power and signal tracks is displayed in Fig. 34. It is a must to have separate power ground (marked thick) and signal (sensing) ground (marked thin). Both these grounds must be connected at a module ground connector node.



Fig. 34 DCDC application diagram example

## 11 Conclusion

Non-isolated step up/step down DCDC converter for automotive applications together with the highlighting of the specific automotive requirements was discussed. The automotive field was the greatest factor in choosing the ACM topology. ACM small and large signal model for stable control loop design was explained. Important functionalities like current sensing and converter start-up were elaborated. Recommendation of layout approach together with application example of ground system was discussed. Selected set of EMC results proves that all strict automotive requirements are met. The fabricated silicon proves that this approach is a well-designed concept.

## References

1. KENG WU.: Switch Mode Power Converters. MA ACADEMIC PRESS–USA 2006
2. Unitrode.: “Average Current Mode Control of Switching Power Supplies”. Application Note U-140
3. SUN C., LEHMAN B., CIPRIAN R.: Dynamic Modeling and Control in Average Current Mode Controlled PWM DC/DC converters. IEEE Transactions on Power Electronic Systems 1999. ISBN 07803-5421-4, pp 1152–1157

4. Unitrode.: “The UC3886 PWM Controller Uses Average Current Mode Control”. Application Note U-156
5. Texas Instruments.: “Understanding Buck-Boost Power Stages in Switch Mode Power Supplies”. Application Report SLUA059A

# Highly Integrated Power Management Integrated Circuits in Advanced CMOS Process Technologies

Mario Manninger

**Abstract** Today's portable devices combine audio and video playback with wireless communication and navigation. As a consequence, the computing power, the size of the display and the graphics operations of portable devices are increasing. Even when changing to new process technologies for the processors, the power consumption often increases due to the higher operating frequencies. The battery technology developments are not improving by the same factor and as a result, intelligent power management becomes mandatory to achieve the required operating hours and days. In addition, the portable devices become smaller and slimmer which requires a reduction of the number and the size of the components. In this paper, highly integrated power management ICs implemented in modern CMOS process technologies will be discussed.

## 1 Introduction

A portable device supports use cases such as high fidelity audio playback, streaming of video content, wireless communication and location based services. This often results in a partition of the hardware into the modem for communication, a multi-standard radio receiver module, a Bluetooth™ module for hands-free talking, a GPS module, an application processor with graphics acceleration, a memory subsystem with an SDRAM and large Flash memory, a battery (today lithium-ion batteries are widely used), a color display, a power management system and a high performance audio subsystem (Fig. 1). The power consumption of such a system varies extremely over the different use cases; the lowest power consumption of below 25 mW is measured for MP3 audio playback to the headphone and over 2 W are required for video decoding including the power for QVGA color display and the stereo speakers. Many different use cases have to be considered and for each one the overall power consumption must be optimized to guarantee a long battery operation. In addition to the power consumption, the overall size is a key differentiator for a portable device.

---

M. Manninger (✉)  
austriamicrosystems AG, Unterpremstätten, Austria  
e-mail: ealarcon@eel.upc.edu



**Fig. 1** Block diagram of a portable device

Of course, lower power consumption enables the usage of smaller batteries, but in addition, the devices get smaller every year with increased functionality which is only possible by higher integration. This paper will discuss the current status and new requirements and developments for highly integrated power management ICs for the next generation of portable devices.

## 2 Power Management Blocks

Integrating all power management function into the processor devices is often not cost effective. Many power management functions are blocks which are connected directly to the battery and have to operate at input voltages of 5 V.

### 2.1 Battery Subsystem

Small portable devices, such as MP3 players, use either AA, AAA or re-chargeable lithium-ion batteries with capacities below 400 mAh. Li-Ion batteries can be charged quickly with linear chargers with currents of about 500 mA. High capacity Li-Ion batteries are used for portable devices with higher operating power consumption, especially for devices with larger displays. These require fast charging with high currents from either wall-adapters, car-batteries or high accuracy linear charging from USB supplies, where a maximum allowed current of 500 mA must not be exceeded. The charger subsystem must support the operation of the portable device without battery in place or with deeply discharged battery while it's connected to the external supply. A fully charged battery will be completely isolated from the

system and from the supply to increase life time of the battery. To decrease the charging time, charging with high external voltages (and high currents) is performed by means of a high efficiency DC/DC step-down charger, which is mandatory to reduce the power dissipation in the power-management IC and in the portable device.

## **2.2 DC/DC Voltage Regulators**

The battery is used to power the individual functions of the complete portable device with appropriate voltages and currents. Many different and independent voltage rails are supplied with highest efficiency using inductive DC/DC converters or charge pumps.

The supply voltage of the processor cores in modern deep submicron process technologies (90 nm, 65 nm, 45 nm) is between 0.65 V and 1.4 V. Using a linear regulator, the efficiency would be below 30%, so the core-voltage is generated by an inductive DC/DC buck converter with an efficiency of over 90%. The current of the processor core IC varies with the clock-frequency and the activity of the different modules on the chip in a wide range from a few mA up to several hundred mA. It is essential that the DC/DC converter achieves high efficiencies even at low load conditions. Dynamic voltage scaling techniques reduce the supply voltage when the processor IC is operated at lower clock frequencies and can reduce the dynamic power consumption significantly. [1–3].

$$P_{dynamic} = C_{active} V^2 f$$

Due to the quadratic change of dynamic power consumption ( $P_{dynamic}$ ) with the supply voltage ( $V$ ) –  $C_{active}$  represents the sum of gate- and interconnect-capacitances which toggle with the frequency ( $f$ ) - this technique is very effective, but it requires very accurate voltage regulation with fine resolution and smooth transitions from one voltage state to the next without any spikes. Adaptive voltage scaling techniques perform on-chip measurement of the process speed, which depends on the actual performance of the individual IC and it varies with the junction temperature. This information is used in a closed loop to operate the IC at an optimal supply voltage.

The frequency of the DC/DC converters itself is a trade-off between efficiency, which decreases with increased switching activity and the size of the external inductors. Frequencies of 2 to 3 MHz can be used with external inductors of 1.0 to 2.2  $\mu$ H. Larger inductors usually give higher efficiencies due to the lower switching frequency. The trend, instead, requires even smaller components and higher frequencies will be used in the near future.

DC/DC buck converters with highest performance are key building blocks for integrated power management ICs. At the same time these blocks must be optimized for efficiency and on the other hand these must be flexible to be used for different processors. A voltage and a current control loop are used for high accuracy and high efficiency. Most handset manufacturers request fixed frequency DC/DC

converters, which require difficult regulation schemes to avoid the occurrence of sub-harmonics [4].

A block diagram of a fully integrated DC/DC buck converters is shown in Fig. 2. The clock frequency of 2.2 MHz is used with an external coil of 2.2  $\mu$ H and a capacitor of 10  $\mu$ F. Over-current and over-voltage regulations and an over-temperature shutdown are included in this block to avoid damages to the power-management IC and to the PCB. The current in the PMOS switch (MP) is sensed and added to the compensation ramp which is used to avoid sub-harmonics. The output voltage feedback is done by a variable resistor R1 which is used to program the output voltage. Resistor R2 together with an OPAMP forms a current sink. Changing the feedback resistor R1 affects the output voltage, but doesn't change the feedback-loop gain. This improves the loop stability and guarantees a wide output voltage range which is required for voltage scaling in today's deep submicron process technologies. A change of the feedback resistor R1 is controlled by a counter which generates a smooth transition between different voltages during dynamic voltage scaling. The output voltage can be changed in steps of 25 mV and the counter can operate at a period of either 4  $\mu$ s, 8  $\mu$ s or 16  $\mu$ s. Figure 3 shows a typical ramp with an output voltage change from 1.8 V to 1.2 V in 96  $\mu$ s (24 steps of 25 mV each in 4  $\mu$ s).



Fig. 2 Block Diagram of the fixed frequency DC/DC step-down converter

**Fig. 3** Ramping the output voltage from 1.8 V to 1.2 V



As can be seen in Fig. 3, the output voltage ripple is already quite high in this configuration with an external capacitor of 10  $\mu$ F. In the system design, this voltage ripple must be taken into consideration to avoid unwanted operation of the processor core at low voltages.

In addition to this voltage ripple, the output voltage varies with input voltage and with the load current. A worst case scenario is when using a processor in burst mode operation, where it toggles between idle and highest operating frequency. The measurement in Fig. 4 shows such transients when turning on a current source of 250 mA on a DC/DC step-down regulator which is optimized for a maximum load current of 250 mA. When switching-on the load current (Fig. 4a), then the output voltage drops and the error amplifier has to increase the duty cycle of the DC/DC converter. Due to the integrating function of the feedback loop, this takes some time and the output voltage decreases. Increasing the feedback loop bandwidth helps improving voltage undershoot. On the other hand, when turning off the current – i.e. setting the processor into idle (Fig. 4b) – then the output voltage increases due to the current



**Fig. 4 (a,b)** Output voltage variation with load step

from the inductor, which pumps into the load capacitor. It has to be avoided that the voltage exceeds maximum operating conditions of the deep submicron process technology. Increasing the output capacitor or decreasing the inductor reduces the overshoot.

Especially the embedded SRAM blocks and PLLs are often quite sensitive to under-voltage conditions or fast transients on the supply voltage. Higher switching frequencies reduce the voltage ripple on the output or enable the use of even smaller inductors, but higher switching frequencies produce higher dynamic power loss from the switches MP and MN. The voltage undershoot can be optimized by increasing the bandwidth of the error feedback loop.

A careful stability analysis and optimization was done using a combination of a small signal model for the buck converter [5] with current control and the complete schematic of the error amplifier with compensation.

The model in Fig. 5 contains the following blocks:

Error amplifier (OTA1) with Rcomp, Ccomp and Cout

Low-frequency model including gain:

$$f_1(s) = K \frac{(1 + s R_C C_L)}{\left(1 + \frac{s}{\omega_p}\right)}$$

*K . . . DC gain*

$$\omega_p = \frac{1}{(R_L C_L)} + \frac{1}{(f_s L C_L)} (m_c (1 - D) - 0.5)$$

$$m_c = 1 + \frac{S_e}{S_n}$$

*S<sub>e</sub> . . . compensating ramp slope*

*S<sub>n</sub> . . . slope of sensed current*

*f<sub>s</sub> . . . clock frequency*



Fig. 5 Small signal model of a DC/DC buck converter

High-frequency term of buck converter with a peak at half the clock frequency:

$$f_2(s) = \frac{1}{\left(1 + \frac{s}{(\omega_n Q)} + \frac{s^2}{(\omega_n)^2}\right)}$$

$$\begin{aligned}\omega_n &= \pi f_s \\ Q &= \frac{1}{\pi} \frac{1}{(m_c(1 - D) - 0.5)}\end{aligned}$$

The simulation results of this small-signal model match well with measurements and guarantee the stability over the different load variations and duty-cycles using a compensation ramp factor between 4 and 6.

The peripheries of the processors and the SDRAM operate mainly at a voltage rail of 1.8 V. Again, this voltage rail can be generated efficiently with another DC/DC buck converter. The load current of SDRAM and the periphery is not constant but changes with the data being transferred between different modules. This transfer is not a continuous transfer at a fixed frequency but happens in burst with high frequencies. During these bursts, the current on the voltage rail increases significantly. Again, high efficiency over different current loads and low output voltage variations during load switching are important for this DC/DC buck converter.

A high-level model was developed for simulating all losses of the DC/DC converter (Fig. 6). This model is used together with a battery model to predict the operating time of the battery when applying a special load profile to the highly integrated power management unit (Fig. 7).

Publications already demonstrate even higher integration of DC/DC converters using either bond-wire inductors, integrated coils or air-core inductors which are integrated on the package [6–8]. To reduce the board space for the external inductor, it's possible to use a single inductor to generate different output voltages [9–12]. The disadvantage is that the efficiency is reduced and that the regulation is extremely



**Fig. 6** High level model for efficiency simulations



Fig. 7 Discharge simulation for a AA battery

critical, which can become a problem especially when dynamic voltage scaling is used.

Another trend is “digital power management”, where the output voltage (and the coil current) is measured with a high frequency ADC and the control loop to the switches uses digital signal processing. The advantage is that the filter characteristics and the control loop using fuzzy logic can be changed with load conditions or for different applications. The main disadvantages are the high power consumption of the ADC and the increased silicon area of this solution.

## 2.2 Low Dropout Voltage Regulators (LDO)

The RF-functions of several chip-sets must be supplied with a low-noise voltage of around 2.5-3 V, so a linear regulator from the battery is the best choice [13]. The key requirements for the LDO are the fast correction of transients from the supply or the load and the low voltage noise on the output. In addition, an on-resistance on less than 1 Ohm for the PMOS guarantees operation down to lowest battery voltages. All these requirements can be achieved best with the architecture described in Fig. 8. A high gain amplifier with internal compensation is designed to guarantee stability over a wide range of load conditions. The high bandwidth amplifier in the inner loop is optimized for the transients on the load and on the input voltage.

The audio codec can be operated at 1.8-2.5 V and again, a low noise, LDO regulated voltage is required for this block. There is a big advantage when the codec can be supplied with just below 1.8 V. Then the 1.8 V DC/DC buck converter can be re-used as input to the LDO. Such a configuration increases the overall efficiency for the audio codec by 40% and it is possible to achieve a total power consumption of below 6 mW from the Li-Ion battery for the stereo-DAC including the headphone

**Fig. 8** High performance LDO



**Fig. 9** NMOS-LDO for low output voltages



amplifier. It must be noted that for LDO-input-voltages of 1.8 V and below, the PMOS transistor must be exchanged by an NMOS device, because the PMOS switch gets very large due to the low gate overdrive voltage. At low supply voltages of e.g. 2.7 V the NMOS transistor's gate overdrive is not sufficient and a step-up charge pump is used to generate a voltage of around 5 V (Fig. 9).

### 2.3 DC/DC Buck-Boost and Hybrid Converters

A typical NAND-Flash and some other periphery operate at 3 V supply, which could be supplied from the Li-Ion battery with a linear regulator with an average efficiency of 83% (calculated from 3.6 V nominal voltage) but using a DC/DC buck converter will increase the efficiency to over 90%. A hybrid converter which combines a linear regulator (LDO) with a DC/DC buck converter could also be used [14, 15]. Another alternative is the usage of a DC/DC buck/boost-converter, which will enable the

discharge of the battery down to 2.7 V. Calculations, measurements and simulations using high-level power consumption models showed that due to the lower efficiency of buck/boost converters over buck converters, the total power budget is better for a buck-converter solution. In addition, the buck/boost converter is more expensive. But this will change with new battery technologies with battery voltages of down to 2.4 V – then the usage of a buck/boost converter brings an advantage to the total power budget.

#### *2.4 Lighting Management*

A major part of the power is consumed by the backlight for the display and especially larger displays of 3 to 4 inches have to be illuminated with 5 to 6 white LEDs. The total power consumption of such a display backlight is between 200 mW and 500 mW and therefore a highest efficiency DC/DC converter is required. A series connection of the LEDs is preferred to generate a uniform display backlight. A high voltage DC/DC boost converter and a single current sink with either



**Fig. 10** DC/DC boost converter

pulse-width-modulation or dc-current regulation is used to control the illumination (Fig. 10). Using a high voltage CMOS process allows even the integration of the HV-NMOS, the diode, the current sense resistor and the feedback divider. Logarithmic dimming of this current sink model a smooth turn on/off effect of the display. In addition to the display backlight you can find an increased number of LEDs on portable devices which are used for illumination of a keypad and a variety of indicator lights. RGB-LEDs are used to generate individual lighting effects. All LEDs are controlled by integrated light pattern-generators with simple commands over the serial interface to the processor. This approach decreases the load on the processor and on the serial interface.

Portable devices with a flash-light require currents of up to 1000 mA for high brightness LEDs over short times of less than a second. These currents are best generated by charge pumps with two external flying capacitors. Such a charge pump can operate in  $1 \times$ ,  $1.5 \times$  and  $2 \times$  modes and delivers the output current with high efficiency. Using external capacitors instead of coils is advantageous for the board space and for the electromagnetic emission, which is a big advantage for systems with RF-receivers.

### 3 A Highly Integrated Power Management IC

Portable devices must be small enough to be widely adopted by users. This is only possible with a high level of integration [16, 17]. Using discrete components for power management is advantageous for a flexible development of the system with late changes in the product design phase but the result is a large PCB with a high number of components on two sides.

A solution with a highly integrated power management unit (PMU) is shown in Fig. 11. The IC is packaged in a 0.5 mm-pitch BGA package and all coils and capacitors are placed around the IC. Such a system solution has many advantages:

- The size of the PCB and the thickness of the device can be reduced substantially and the number of passive components and the interconnect decreases.
- The individual regulators on the PMU have higher programmability - all controlled by a single interface.
- A single high precision reference block can be used on the PMU which then supplies all other regulators.
- A single clock reference is used to control the individual DC/DC converters and to guarantee a synchronous operation resulting in much lower interference with e.g. RF blocks.
- Combinations of DC/DC converters with LDOs or current sinks result in improved performance and efficiency.
- A single general-purpose ADC with an input multiplexer can be used to monitor all voltage rails and also the junction temperature of the IC.



**Fig. 11** PCB with highly integrated Power Management IC

The startup- and reset-sequences of the device are highly configurable to speed-up the system development time [18]. The configuration of these sequences is defined by the ratio of an internal bias resistor which is trimmed with high accuracy and an external resistor, RPROGRAM. At the beginning of each reset cycle a 3 bit AD-conversion is performed. The result of this conversion is used to select 1 of 8 possible address-ranges of an internal metal-mask programmable ROM. The information in the ROM defines the following parameters:

- Default voltage levels for all regulators and step down DC/DC converters
- power-on sequence of all regulators and step down DC/DC converters
- duration of the reset cycle.

Adding advanced audio processing into the power management unit further increases the level of integration. This can easily be done without requiring additional voltage rails. Careful chip design and layout techniques are used to achieve over 95 dB signal-to-noise ratio. Careful layout of the PCB is required to get this high performance for all different use cases of the portable device.

A chip layout plot of the highly integrated power management unit is shown in Fig. 12. The device is manufactured in a 0.35  $\mu\text{m}$  CMOS process and contains all power management functions required for portable devices such as multimedia players, satellite radio receivers or navigation devices and in addition a complete high fidelity stereo audio codec.



**Fig. 12** Layout plot of the highly integrated Power Management IC

## 4 Conclusions

A highly integrated power management IC is presented and implementation challenges of key building blocks were discussed. The integration of such complex power management ICs is a challenge for system designers but due to the enormous amount of programmability and flexibility of the IC, it can be used for many different portable devices. The high integration and the advanced technology and design architectures of this IC enable the development of portable devices with highest power efficiency, which are more compact, thinner and cheaper.

The author thanks the Analog/Mixed-Signal design team and P. Trattler, Product Manager for Power Management ICs, for the valuable ideas and discussions.

## References

1. D. Monticelli, "Taking a system approach to energy management". Proceedings of the 29th European Solid-State Circuits Conference, Vol. 29, pp. 15–19, Sep 2003.
2. B. Zhai, D. Blaauw, D. Sylvester, K. Flautner, "Theoretical and Practical Limits of Dynamic Voltage Scaling". <http://www.gigascale.org/pubs/495/insomniac.author.submit.pdf>.

3. J. Tschanz, N. Kim et al., "Adaptive Frequency and Biasing Techniques for Tolerance to Dynamic Temperature-Voltage Variations and Aging"; ISSCC 2007 / Session 16 / Power Distribution and Management, February 2007.
4. C. Lee, P. Mok, "A monolithic current-mode CMOS DC-DC converter with on-chip current-sensing technique", IEEE Journal of Solid-State Circuits, Vol. 39, pp. 3–14, Jan 2004.
5. R.B. Ridley, An Accurate and Practical Small-Signal Model for Current-Mode Control; <http://www.ridleyengineering.com>.
6. A. Richelli, L. Colalongo, M. Quarantelli, M. Carmina, Zs. M. Kovacs-Vajna, "A fully integrated inductor-based 1.8-6-V step-up converter", IEEE Journal of Solid-State Circuits, Vol. 39, pp. 242–245, Jan 2004.
7. P. Hazucha, G. Schrom et al., "A 233-MHz 80%–87% efficient four-phase DC-DC converter utilizing air-core inductors on package", IEEE Journal of Solid-State Circuits, Vol. 40, pp. 838–845, Apr 2005.
8. M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, "A 3 GHz Switching DC-DC Converter Using Clock-Tree Charge-Recycling in 90 nm CMOS with Integrated Output Filter", ISSCC 2007, Analog and Power Management Techniques, 29.8; February 2007.
9. S. Hoon, N. Culp, J. Chen, F. Maloberti, "A PWM dual-output DC/DC boost converter in a 0.13  $\mu$ m CMOS technology for cellular-phone backlight application", Proceedings of the 31st European Solid-State Circuits Conference, Vol. 31, pp. 81–84, Sep 2005.
10. D. Ma, W.-H. Ki, C.-Y. Tsui, P. Mok, "Single-inductor multiple-output switching converters with time-multiplexing control in discontinuous conduction mode", IEEE Journal of Solid-State Circuits, Vol. 38, pp. 89–100, Jan 2003.
11. E. Bonizzoni, F. Borghetti, P. Malcovati, F. Maloberti, B. Niessen, "A 200 mA 93% Peak Efficiency Single-Inductor Dual-Output DC-DC Buck Converter", ISSCC 2007, Analog and Power Management Techniques, 29.5; February 2007.
12. H-P. Le, C-S. Chae, K-C. Lee, S-W. Wang, G-H. Cho, G-H. Cho, "A Single-Inductor Switching DC-DC Converter With Five Outputs and Ordered Power-Distributive Control", IEEE Journal of Solid-State Circuits, Vol. 42, No. 12, December 2007.
13. W. Oh, B. Bakkaloglu, B. Aravind, S. Hoon, "A low 1/f noise CMOS low-dropout regulator with current-mode feedback buffer amplifier", IEEE Custom Integrated Circuits Conference, Vol. 19, pp. 213–216, September 2006.
14. M. Hiraki, T. Ito, A. Fujiwara, T. Ohashi, T. Hamano, T. Noda, "A 63-uW standby power microcontroller with on-chip hybrid regulator scheme", IEEE Journal of Solid-State Circuits, Vol. 37, pp. 605–611, May 2002.
15. T. Barber, S. Ho, P. Ferguson, "Multi-mode CMOS Low dropout voltage regulator for GSM Handsets", Symp. VLSI Circuits Dig., Vol. 16, pp. 284–287, June 2002.
16. C. Shi, B. Walker, E. Zeisel, B. Hu, G. McAllister, "A highly integrated power management IC for advanced mobile applications", IEEE Custom Integrated Circuits Conference, Vol. 19, pp. 85–88, September 2006.
17. C. Shi, B. C. Walker, E. Zeisel, B. Hu, and G. H. McAllister, "A highly integrated power management IC for advanced mobile applications", IEEE Journal of Solid-State Circuits, Vol. 42, pp. 1723–1731, August 2007.
18. T. Bühler, H. Haiplik, T. Jessenig, M. Lueger, "austriamicrosystems AG; Integrated circuit arrangement, and method for programming an integrated circuit arrangement" Patent: EP1625663, WO2004105246.

# Wideband Efficient Amplifiers for On-Chip Adaptive Power Management Applications

Lázaro Marco, Vahid Yousefzadeh, Albert García-Tormo, Alberto Poveda, Dragan Maksimović and Eduard Alarcón

**Abstract** This chapter provides a review of system-level and circuit-level implementation aspects of strategic adaptive power management techniques which require wideband efficient power amplification, and that are crucial for power demanding loads in portable devices, such as envelope tracking for polar RF power transmitters, and on-chip line drivers for power line communications. The stringent specifications of such amplifiers pose relevant challenges both to assess the system-level impact of the amplifier limitations, as well as to design the power converter both in terms of converter topology and control. A discussion on advanced topologies aiming miniaturization and wideband low-distortion operation, both multi-level conversion and linear-assisted scheme, is presented. Details of advanced modulation and control methods are shown, namely low-oversampling ratio sigma-delta modulation with high-order filter, and digital predistortion of output filter dynamics built-in in PWM modulation.

## 1 Introduction

In telecommunications and computing portable systems, the continuous trend in miniaturizing power processing subsystems stands from the global system-level impact of such subsystems in terms of volume and weight, and thus on portability. Also, efficiency in energy processing directly affects operating lifetime. This statement is particularly true for current and future generation systems-on-chip (SOC), a trend which provides a line of convergence for the implementation of current and future systems for portable, mobile and autonomous applications, in which power management is one of the key performance limiting factors as regards to ergonomics and operability time. The ultimate step consequently consists in the fully monolithic integration of the power converter together with the same circuits which constitute its load within either the same substrate or chip package.

On-chip efficient power management circuits usually focus on integrating a regulator, a circuit which pursues a stable output voltage regardless of perturbations

---

E. Alarcón (✉)  
Technical University of Catalunya, Barcelona, Spain

in both the output load and input power source. But there are indeed a few challenging applications in which the output of the converter is expected to track a wideband time-varying signal. Among the different applications for such wideband amplifiers, we could point out several cases with stringent specifications, namely (A) Efficient line drivers for Power Line Communications, (B) System-level optimization of power consumption in digital circuits, via an Adaptive Voltage Scaling scheme (AVS), (C) MEMS-based capacitive actuators, (D) Audio amplifiers, and (E) The Envelope Elimination and Restoration (EER) or Kahn technique and similar envelope tracking RF transmitter architectures, which theoretically allow implementation of linear highly efficient RF Power Amplifiers (as required by modern digital modulations such as those used in EDGE and UMTS as well as in WiLAN and 4G applications that exhibit significant envelope modulation, and are potentially suited to RF power amplification for multistandard and digital radio architectures) by adaptively supplying the RF PA with the wideband baseband envelope signal –via a switching power converter, of the type usually applied to efficient DC-DC conversion-.

One of the key remaining challenges for a successful realization of such systems is the practical implementation of the efficient, wide-bandwidth tracking power converter. This challenge is emphasized when an on-chip implementation targeting battery-operated mobile terminals is envisaged.

This chapter focuses on different aspects at system-level and at circuit-level (both converter topology and control) aiming on-chip integration of a switching power converter targeting wideband tracking applications.

## 2 EER Technique: System-Level Impact of Circuit-Level Effects

One of the most representative adaptive power management techniques, namely the adaptive power management of an RF PA (Fig. 1), which requires system-level implications to design power supplies, is discussed in the following.

RF power amplifiers (RFPA) are the most dominant power-consuming devices in battery-operated terminals for communication systems. Systems like GSM employ modulation schemes that generate constant amplitude RF outputs in order to allow use of high-efficiency switched-mode RF power amplifiers (class-E, class-F). With the growing emphasis on channel capacity, the newer generation of communication



**Fig. 1** Envelope tracking technique for RF power amplifiers

systems (such as EDGE, CDMA or WCDMA) use non-constant envelope RF signals (with MHz bandwidths) associated with spectrum-efficient digital modulations so as to increase the channel capacity. Unfortunately, the amplification of non-constant envelope signals requires linear RF power amplifiers, which inherently have lower efficiency. In that scenario, Envelope Elimination and Restoration (EER) technique (Fig. 1) or, more generally, polar modulation techniques, have been proposed to improve efficiency of RFPA systems by employing an efficient switched-mode RFPA supplied from an envelope-tracking power converter. Although the focus is primarily on low-power battery-operated systems typical for mobile handsets, the techniques presented can also apply to high-power RFPA systems commonly found in communication base stations.

The Envelope Elimination and Restoration (EER) technique theoretically allows implementation of linear highly efficient RF Power Amplifiers, as required for next generation digital communications. One of the key remaining challenges for a successful implementation of the EER technique is the efficient implementation of the switching power converter in charge of amplifying the baseband envelope signal, since bandwidths in the order of several MHz are expected for the envelopes to be tracked, hence requiring very high switching frequencies and thus compromising efficiency. This section investigates the feasibility of the EER technique by studying the impact of the nonidealities associated to the switching power converter tracking process, namely its limited bandwidth and ripple, upon the overall polar amplification EER scheme. Considering a two-tone test input signal, a design space exploration of the distortion associated to both nonideal effects is evaluated in terms of the output spurious-free dynamic range (SFDR). Design criteria for the optimum filtering characteristic and phase compensation between polar paths are derived. The section concludes by exploring the extension of switching power converter design criteria for an actual CDMA modulation signal.

The EER technique, a circuit-level representation of which is shown in Fig. 2, considers separate power amplification paths for the polar representation of complex signals, namely the narrowband phase modulation signal  $\varphi(t)$  -by means of a tuned class E or class F RF efficient switching PA- and for the broadband baseband envelope signal  $e(t)$  -by means of a switching power converter, of the type usually applied to efficient DC-DC conversion-. With this configuration the RFPA supply is adaptively modulated so as to restore the envelope signal to the phase modulated signal, so that this technique is capable of providing overall high efficient linear power amplification.

Several design challenges still preclude the complete success of this strategic technique. Envelope detector and limiter imperfections, delay mismatch through the two signal paths, and supply-AM/supply-PM distortion at the PA all degrade the system linearity. One of the key remaining challenges for a successful realization of such system is the practical implementation of the efficient, wide-bandwidth envelope-tracking power converter. The system-level study of how the limited bandwidth of an actual buck switching power converter, together with its inherent ripple affects upon the EER technique is discussed in the following, both for a two-tone test signal and for an CDMA signal.



**Fig. 2** Circuit-level representation of EER with two-tone test input

The ideal EER scheme is distortionless from the input signal path, but any non-ideality directly results in a nonlinear input-output characteristic. The use of a two-tone test signal, composed of the addition of two sinusoidal signals closely spaced around the carrier signal, is widespread to characterize amplifiers in communication systems, since its envelope is a rectified sinusoidal signal with infinite frequency band that covers the complete amplitude dynamic range. The analysis of the two-tone test effect upon the EER system is better analyzed in the spectrum domain, as illustrated in Fig. 3.

Once the type of input signal is considered, the minimum set of representative parameters of the switching power converter for the subsequent design space exploration has to be identified. Resorting to Fig. 3, it is proposed to use the filter frequency to baseband tone frequency ratio  $f_0/f_x$  to characterize filter effects, and



**Fig. 3** EER scheme including PWM and filtering for a two-tone test

the switching frequency to filter frequency ratio  $f_s/f_o$  to characterize switching effects. Note that these indexes are relative and hence are of general purpose.

The effects associated to filtering and switching, of heterogeneous nature, result in, respectively, harmonic and non-harmonic distortion. Hence, it is argued that the conventional intermodulation product parameter is not an appropriate performance index. It is proposed to use instead the spurious-free-dynamic range (SFDR), which allows describing both scenarios with a unified distortion measure.

Since the PWM process is nonlinear, the system cannot be studied with a superposition of filtering and modulation effects. However, a design-oriented separated study of these effects allows investigating behavior trends.

The need of high ripple rejection with moderate switching frequencies in EER is usually solved by resorting to 4th order lowpass LC ladder filters in the PWM amplifiers to recover the signal after the modulator. It is considered that the transfer function is that of a cascade of second-order biquads with adjustable quality factor and cutoff frequency. The issue of phase compensation between the envelope and phase paths -see Fig. 3- has been analyzed for different cases, namely, using no compensation, using a matched filter, using a delay and using a matched allpass filter. Additionally, the use of a linear phase Bessel filter has been also investigated. For each case a sweep of simulations has been carried out for the restored baseband tone associated to the two-tone test, both in time-domain for illustrative purposes and in frequency domain, that allows obtaining the SFDR.

As an example, the distorted output signal as a function of  $f_0/f_x$ , the filter frequency to baseband tone ratio, and the filter quality factor  $Q$  are shown in Fig. 4 for the double biquad filter with an allpass filter in the phase path that exhibits the same phase behaviour. The SFDR index obtained from these set of simulations is shown in Fig. 5 for the filter complete parameter design space.

A SFDR comparison between all the simulated EER schemes is shown in Fig. 5(b). It is inferred that the system with highest performance corresponds to a Bessel filter in the envelope path and a plain delay in the phase path.

In the preceding characterization the SFDR distortion index has been obtained considering filtering only, whereas in the following the results correspond to both filtering and switching applied concurrently, hence completely modeling an actual switching power converter. In order to be included in the design space exploration, the ratio of the switching frequency related to filter cut-off frequency has been taken as indicative of PWM modulation. In this case the bidimensional design space (quality factors are assumed to correspond to maximally flat behavior, *i.e.*  $Q = 0.7$ ) yield a matrix of simulations. It is relevant to evaluate the impact of including PWM and filter mechanisms upon the internal envelope signal associated to the two-tone test. Figure 6(a) shows the full parameter sweep, in which it is observed that filtering notably affects the fidelity of the envelope signal, while switching only adds ripple component.

In order to investigate how these combined effects are reflected at the output, Fig. 6(b) depicts the restored output signal spectra so as to obtain the SFDR index. The case under consideration is the optimal one as regards filtering alone, *i.e.*, the Bessel filter. Note that both harmonic and nonharmonic distortion can be observed.



**Fig. 4** EER output signal when different cut-off frequencies and Q factors are considered. All-pass filter phase compensation case



**Fig. 5** (a) SFDR for double biquad and all-pass filter providing phase compensation (b) Comparison of different filtering and phase compensation schemes

Figure 7(a) represents the numerical values of the measured SFDR, whereas 7(b) shows the relative distortion improvement due to the use of the Bessel filter configuration. For frequency ratios higher than a factor of four, the improvement is flat around 10 dB.

The previous design space exploration for a two-tone test signal, investigated in terms of output distortion, is in this section extended to an actual CDMA signal. The upper waveform in Fig. 8(a) shows the noise-like 3 MHz bandwidth wideband signal associated to the envelope of the CDMA signal used in the IS95 digital communications standard. So as to illustrate the effect of the buck converter, the lower waveform shows its output for  $f_0 = 10$  MHz and  $f_s = 15$  MHz.

In order to evaluate the impact of converter nonidealities upon the output signal spectra, so that system-level performance indexes such as adjacent channel interference and spectral mask fulfillment can be evaluated, Fig. 8(b) shows the spectra of the output signal (radiated by the antenna) as a function of the buck converter complete parameter space. Note that filtering effects result in a higher level of the plateau associated to spectral regrowth. On the other hand, switching effects result in spectral energy contents that might strongly interfere with adjacent channels.

The system-circuit behavioural study discussed in this section could be also carried out in other wideband amplifier applications with stringent circuit-level specifications and sophisticated system-level requirements, such as Line drivers for Power Line Communications.

### 3 Topologies for Wideband Efficient Amplification

Several power converter schemes have been introduced for the envelope-tracking supply of RF power amplifiers. In this section, alternative approaches based on multilevel switching power conversion and linear-assisted schemes are discussed.



**Fig. 6** PWM and Bessel filtering applied to the envelope path for the complete cut-off and switching frequency design space. **(a)** Time-domain internal envelope **(b)** Output signal spectra



**Fig. 7** (a) SFDR index for the complete converter parameter space (b) Distortion improvement when considering a Bessel filter

### 3.1 Multi-Level Switching Power Amplifier

Multilevel converters with flying capacitors, such as the three-level (*i.e.*, two-cell) buck converter shown in Fig. 9(a), have been proposed for high-voltage, high-power applications. In this section, it is proposed the use of the three-level buck converter configuration to achieve favorable trade offs in terms of the switching ripple, efficiency, bandwidth, or decreasing filter element sizes in envelope-tracking power supplies, including RFPA systems in low-power, battery-operated electronics.

The power stage of the three-level buck converter is shown in Fig. 9(a). Two synchronous buck cells,  $(Q_1, Q_2)$  and  $(Q_3, Q_4)$ , are operated at the same duty cycle  $D$ , and phase shifted by  $180^\circ$  (similar to the operation of a two-phase converter), as illustrated by the waveforms in Fig. 9(b). Assuming that the flying capacitor voltage  $V_C$  equals  $V_{in}/2$  due to topological symmetry, the switch node voltage can take one of three possible levels: 0,  $V_{in}/2$ , or  $V_{in}$ . Furthermore, by phase shifting the switching of the two pairs of transistors, the frequency of the  $V_{SW}$  pulses is  $2f_s$ , where  $f_s$  is the switching frequency. The three-level operation, in combination with the effective doubling of the switching frequency, results in favorable trade-offs in terms of decreasing the switching ripples, decreasing the switching frequency, reducing the size of the filter elements, increasing the converter open-loop bandwidth, or increasing the converter efficiency. For example, assuming the same switching frequency  $f_s$  and the same maximum switching ripples, the three-level converter requires 4 times smaller inductance and 2 times smaller capacitance compared to the standard buck converter. In the experimental prototypes, the switching frequency of the standard buck converter must be increased by a factor of  $2\sqrt{2}$  from 200 kHz to 560 kHz to obtain the same maximum output voltage ripple of 12 mV as in the three-level converter prototype. As a result of the increased switching losses, the efficiency of the standard buck converter is  $\eta = 0.83$ , while the efficiency of the three-level buck is  $\eta = 0.92$ , in spite of the increased conduction losses due to the switches connected in series.



**Fig. 8** (a) Effect of buck switching power converter with  $f_x \cong 3$  MHz,  $f_0 = 10$  MHz and  $f_s = 15$  MHz upon a CDMA envelope signal (b) Design space exploration for the output signal spectrum



**Fig. 9** (a) Three-level buck converter (b) Gate signals  $g_1$ ,  $g_2$ , and switch node voltage  $V_{sw}$  for  $D < 0.5$ , and for  $D > 0.5$

In the target application of Fig. 1, it is of interest to examine the converter performance under time-varying modulation signals. Figure 10 shows waveforms corresponding respectively to the three-level and the standard two-level PWM signals together with the converter outputs. The three-level converter tracks the sinusoidal waveform ( $f_m = 10$  kHz) with much reduced ripples.

Figure 10(b) shows the spectra of the two-level and the three-level PWM modulated signals (showing Bessel-function-shaped frequency components around the switching frequency harmonics). Given that the spectrum of the three-level PWM ideally results in a complete cancellation of all the odd harmonics, an improved design tradeoff is observed with respect to the choice of the switching frequency  $f_s$  compared to the filter corner frequency  $f_o$ , and the modulation frequency  $f_m$ .

An experimental three-level buck converter has been constructed with the filter elements  $L = 10 \mu\text{H}$ ,  $C = 0.66 \mu\text{F}$ ,  $R = 5 \Omega$  (representing the RFPA load), and the flying capacitor of  $C_x = 47 \mu\text{F}$ . The switching frequency is  $f_s = 200$  kHz, and  $K = 1.25$ . The output voltage is duty cycle modulated using a 10 kHz rectified sinusoidal waveform as the reference signal.



**Fig. 10** (a) Time-domain waveforms for three-level and two-level PWM modulated signals and the output converter voltages. (b) Spectra of the three-level and the two-level PWM signals with sinusoidal modulation. (log y-axis)



**Fig. 11** Output voltage waveform and frequency spectrum. (a) standard (two-level) buck converter (b) three-level buck converter

The output voltage of the converter is obtained in time and frequency domains, and the results are compared in Figs. 11(a,b) against the setup with the conventional two-level buck converter. It can be observed that both converters are capable of reproducing the envelope waveform, but the switching harmonics with the three-level buck are significantly smaller.

### 3.2 Linear-Assisted Switching Power Amplifier

Figure 12(a) shows an implementation of polar RF PA by considering a linear-assisted switching power converter for the baseband wideband efficient power converter. In approaches based on linear-assisted switched-mode converters, efficiency optimization remains a challenge, though an enhanced performance by trading-off tracking fidelity and efficiency is envisaged for this topology.

The total efficiency  $\eta_{total}$  of a linear-assisted switcher for an arbitrary signal can be computed from the model given in Fig. 12b, in which the input and the output power for the switching and the linear amplifier are found as functions of the corresponding amplitude density distributions:

$$\eta_{total} = \frac{P_{out}}{P_{in-sw} + P_{in-Lin}}, \quad \eta_{total} = \frac{\sum_{k=1}^n a_k^2 p_o(a_k) \Delta a}{\sum_{k=1}^{n_{sw}} \frac{a_k^2}{\eta_{sw}(f_{sw})} p_L(a_k) \Delta a + \sum_{k=1}^{n_{lin}} \frac{a_k^2}{\eta(a_k)} p_H(a_k) \Delta a} \quad (1)$$

where  $p_L(a_k)$  is the amplitude density function of the signal at the output of the switching amplifier,  $\eta(f_{sw})$  is the efficiency of the switching amplifier, which is a function of the switching frequency, and  $p_o(a_k)$ ,  $p_L(a_k)$  and  $p_H(a_k)$  are the discretized amplitude density distributions (*i.e.*, histograms for different values of the amplitude slots of  $a_k$ ),  $n$ ,  $n_{sw}$  and  $n_{lin}$  being the number of amplitude slots in the



**Fig. 12** (a) A polar modulation transmitter including a linear-assisted switcher as the envelope-tracking power supply for the RF power amplifier (b) Model of the linear-assisted switching amplifier, with  $p_i$ ,  $p_o$ ,  $p_L$  and  $p_H$  the amplitude density distributions at the input and the output of linear assisted switching amplifier, and the output of switching and linear amplifier

histogram) of the signals at the outputs of the linear-assisted switcher, the linear amplifier, and the switching amplifier, respectively,  $\eta(a_k)$  is the efficiency of an ideal linear amplifier as a function of the signal amplitude  $a_k$ .

Changing the output filters bandwidth  $f_B$  affects the amplitude distributions  $p_{aL}$ ,  $p_{aH}$ ,  $p_a$  and the switching frequency of the switching amplifier. As a result, the total efficiency depends on the band separation frequency  $f_B$ , as illustrated by the following example of an EDGE baseband envelope signal (Enhanced Data rate for GSM Evolution).

The flow of the efficiency optimization is first to design the input/output filters for a given band-separation frequency  $f_B$  and then to obtain the total efficiency  $\eta_{total}$  for the designed filters. This process should be repeated until the maximum total efficiency  $\eta_{total}$  is found. The corresponding band separation frequency  $f_B$  is the optimized value.

Figure 13(a) shows the total efficiency  $\eta_{total}$  as a function of the band separation frequency  $f_B$ . It is observed that the optimum band separation frequency that maximizes the total efficiency is located at very low frequency ( $f_B = 2$  kHz). The switching frequency  $f_{sw} = 200$  kHz is chosen for the maximum efficiency of the switching converter. The output filter is designed as a second order Butterworth filter.

Figures 14(a), (b), and (c) show the amplitude distribution of  $p_i(a)$  at the input of linear-assisted switching amplifier, and  $p_L(a)$  and  $p_H(a)$  at the output of switching and linear amplifier respectively. The amplitude distributions are obtained for the optimum band separation frequency  $f_B = 2$  kHz.

Figure 13(b) shows the simulation waveforms of the linear-assisted switching amplifier with  $f_B = 2$  kHz and tracking an EDGE signal. The signal  $v_o$  is the combination of the two signals  $v_{swo}$  and  $v_{lino}$ .



**Fig. 13** (a) Total efficiency  $\eta_{total}$  as a function of band separation frequency  $f_B$  for the square wave input signal. (b) Simulation waveforms in the linear-assisted switching amplifier tracking an EDGE signal



**Fig. 14** Amplitude density distribution in the output of (a) linear, (b) switching, amplifier, and (c) linear-assisted switching amplifier for EDGE

## 4 Modulation and Control Aiming Wideband Efficient Amplification

Complementing the previous review of alternative topologies targeting wideband efficient amplification, in this section, alternative modulation and control techniques are discussed.

### 4.1 High-Order Filter Buck Converter with Asynchronous Low-OSR Sigma-Delta Modulation

This section compares two methods to time-encode continuous-time wideband signals, namely asynchronous sigma-delta and PWM, targeting modulating methods for buck-based switching power converters operating as wideband power amplifiers. In the applications of interest for this chapter, there exists a trade-off involving tracking error and OverSampling Ratio (OSR) (defined as half the average of switchings

per input period). This section characterizes and compares, in terms of tracking error, the complete design space of modulation depth, OSR and filter cutoff frequency for each encoding method. Moreover, it is also characterized an additional dimension in the design space, the effect of using a high-order buck converter. The results from this comparison point-out that low-OSR combined with high-order buck converters are good candidates to address the challenge of wideband high-efficiency amplifiers.

A switching amplifier can be split into two main blocks, an encoding machine (signal path) and a power decoding machine (power path). The Time Encoding Machine (TEM) must encode the continuous-time input signal into a discrete amplitude time sequence; this sequence is then sent to the Power Time Decoding Machine (PTDM), which recovers the original (with ripple) signal from the sequence (Fig. 15).

Conventional switching amplifiers are based on using a high-OSR PWM encoder and a low order filter, operating in closed loop. The underlying idea is to shift the PWM spectrum to high frequency, so that the low-order filter is able to reject most of it; moreover, the low-order filter facilitates a stable closed-loop operation. This approach, which minimizes the conduction losses, has good performance for regulators but not for signal tracking. In modern applications such as EER or line drivers for PLC, the switching frequency might be so high that switching losses would be unacceptably high. When operating the switching converter in open loop (as in some linear-assisted schemes), it is feasible to use a high-order filter; consequently, to achieve the same errors, the encoding frequency can be reduced. This new topology



**Fig. 15** Signal encoding and recovery for each time-encoding machine

adds one degree of freedom, through which it is possible to balance conduction losses, switching losses and tracking error thus achieving improved overall losses.

The simplest way to encode with low OSR is to reduce the sawtooth frequency in a PWM. However, simulations reveal that PWM is unsuitable to low OSR. It has an inherent limitation regarding the input signal slew-rate (it cannot be higher than sawtooth slope) and, worse than that, it has severe aliasing of the Bessel-shaped high frequency harmonics into baseband.

According to Nyquist's sampling theorem, the minimum OSR to encode with no aliasing is two. Nonetheless this theorem applies to regular sampling; PWM, however, samples irregularly (constant frequency but variable duty cycle, and samples are both rising and falling edges). This yields very high errors in all frequencies but at switching frequencies multiples of input signal frequency whereon PWM encodes with no aliasing. Aliasing in PWM is due to synchronous operation. The sawtooth signal forces the encoder to switch regardless of the input signal waveform. This constraint suggests encoding with an asynchronous TEM, like an Asynchronous Sigma-Delta Modulator (ASDM), which may overcome the clock constraint.

ASDM block diagram is depicted in Fig. 15. Simulations confirm that this TEM encodes more efficiently than PWM with low OSR (Fig. 16b), yet it has some aliasing (error notches). Although this TEM encodes more efficiently, the switching frequency is no longer constant and it is hard to know beforehand the average switching frequency, although it is bounded.

When encoding with low OSR, switching losses decrease but recovered error increases. In order to keep the recovered error low, it is suggested to use a high order filter, which can be interpreted as a signal reconstruction filter. Figure 16 show the recovered error for different PTDMs, and different TEMs. The improvement is less significant with low OSR, since the error sources are the in-band harmonics, which are not rejected by the filter. The high order PTDM adds one degree of freedom, the filter shape. Other filter topologies (like Chebychev) have sharper cutoff frequencies, but their group delay is worse.



**Fig. 16** (a) PWM single tone performance for different modulation depths (b) ASDM single tone performance for different modulation depths (4th order Butterworth filter with  $f_c = 1.5f_0$ )



**Fig. 17** ASDM encoding and reconstruction; input (*top*), encoded (*middle*) and recovered (**a**) circuit-level simulation (**b**) experimental prototype

As proof-of-concept of using low-OSR sigma delta together with high-order filter, an experimental waveform corresponding to ASDM matching circuit-level simulation results are shown in Fig. 17. Displayed waveforms are the encoded signal (blue) and the integrator output (green), because they are the most important and critical signals in this TEM.

#### 4.2 Digital Predistortion of Filter Characteristics

The implementation of the best delay compensation between two polar paths is an open challenge. This delay originates from the filter needed to restore the envelope after the PWM modulation, the side-effects of which are to change both the phase and the magnitude of the different spectral components of the envelope. By using a Bessel filter, as discussed in Section 2, due to the constant band-pass group delay, the delay mismatch can be minimized and compensated by a single delay, however, the magnitude effects and the out-of-band non-constant group delay effects are still present, leading to suboptimal performance.

In this subsection, an alternative method for compensating the filter dynamics is presented, showing improvements compared to the standard delay-compensation approach. The underlying idea of the precompensation method is to predistort with the inverse filter the reference signal applied to the PWM modulator, generated digitally in baseband. Considering a second order LC filter as the output reactive lowpass lossless filter, the transfer function is defined by (2) and its inverse by (3). By applying (3) to the reference signal  $x(t)$ , the pre-compensated version is found, as shown in (4). The implementation of (4) can lead to several problems because of the derivative and second derivative effects upon any noise in the baseband envelope signal. However, in this case, as the source signal is digitally generated instead of sampled, the noise due to the derivatives can be neglected, and additionally, derivation can be implemented just by finite-difference subtractions. The implementation of this predistortion method is shown in Fig. 18.



**Fig. 18** Digital signal generator modification to predict distort filter dynamics

$$H(s) = \frac{\omega_0^2}{s^2 + 2\zeta\omega_0 s + \omega_0^2} \quad (2)$$

$$H^{-1}(s) = \frac{1}{\omega_0^2} s^2 + \frac{2\zeta}{\omega_0} s + 1 \quad (3)$$

$$\tilde{x}(t) = \frac{1}{\omega_0^2} \frac{d^2 x(t)}{dt^2} + \frac{2\zeta}{\omega_0} \frac{dx(t)}{dt} + x(t) \quad (4)$$

To provide representative results, two different simulations have been considered. For envelope spectrum, a 250kHz pseudo-random noise and a fixed-frequency filter, according to the current hardware implementation are considered. For EVM and spectral mask considerations, a real set of EDGE modulated signals are considered. Figures 19 and 20 show results for the predistortion method.

Pre-compensation with real parameters, even considering deviations due to tolerances in filter components show clear improvement: in Fig. 19(a) the envelope recovers its original spectrum, and in Fig. 19(b) the EDGE simulated waveform meets mask requirements of the EDGE standard. In Fig. 20, the relative improvements are shown. The EVM is decreased for the same filter parameters when applying



**Fig. 19** Digital predistortion **(a)** Pseudo-random noise input signal. **(b)** EDGE standard signal spectra

**Fig. 20** Recovered signal EVM as a function of output filter cut-off frequency



precompensation, considering both Bessel and Butterworth filters with a sweep in cut-off frequency from 75 kHz to 500 kHz.

## 5 Conclusions

This chapter has reviewed the state-of-the art and identified several open challenges in the field of wideband efficient switching amplifiers. Considered a relevant example of advanced applications in the field of on-chip high-density power management circuits for portable applications, the strategic technique of wideband adaptive RF PA power management has been discussed. The main idea is based on adaptive power control of the RF PA by dynamically scaling the supply voltage to track the fast envelope of the modulated signal. The technique yields improved efficiency and linearity, and allows the same hardware to support different communication standards. Technical challenges are still significant, and hence it is an area of active research and development. The chapter has addressed different aspects at system, topology and modulation/control levels.

Section 2 focuses on studying the effects of the switching power amplifier in charge of amplifying the envelope signal in an EER scheme. For a two-tone test aiming distortion characterization, the identification of representative design parameters of the switching power converter, namely two frequency ratios  $f_0/f_x$  and  $f_s/f_0$  together with a unified performance index SFDR is discussed. Evaluation of different filtering approaches in the envelope path and phase compensation schemes in the overall polar scheme have been presented, yielding the conclusion that a constant group delay Bessel filter with pure delay compensation provides improved distortion performance. Characterization of switching effects in the envelope path unveils that they are generally masked by the distortion due to the filter, so that higher cut-off frequencies and low switching frequencies have higher performance than low cut-off frequencies and high switching frequencies. Finally, a design space exploration for the buck converter parameters for the wideband envelope signal associated to the

CDMA modulation in IS95 standard has been presented. Evaluation of distortion in the spectrum domain has allowed building a bridge between circuit-level design parameters and system-level performance indexes.

In Section 3, devoted to topologies, first in Section 3.1 a three-level converter is proposed as an envelope tracking power supply for RF power amplifiers. The three-level operation, in combination with the effective doubling of the switching frequency, results in favorable trade offs in terms of decreasing the switching ripples, decreasing the switching frequency (thereby increasing efficiency), reducing the size of the filter elements, increasing the converter open-loop bandwidth. These results are analogous to the results achieved with a two-phase parallel-connected buck configuration. However, in converters where the inductor current ripples are relatively large, which is often the case in low-voltage, point-of-load applications, the single inductor size in the three-level converter can be significantly smaller than the size of the two inductors in the two-phase configuration. Furthermore, in the high-ripple case, the switch conduction losses are not significantly higher in the three-level converter. Experimental results demonstrate how the desired envelope tracking performance can be achieved with much reduced switching noise or with improved efficiency compared to the standard buck converter.

Secondly, in Section 3.2 it is revisited another approach to implement widebandwidth, high-efficiency power amplifiers, which is based on a combination of a linear amplifier (for wide bandwidth) and a switching amplifier (for high efficiency). In this subsection the efficiency modeling and efficiency optimization for the linear-assisted switcher is addressed, including pre-compensation filters to achieve near-ideal system frequency responses. A method is proposed to separate the frequency band of the switcher and the linear path for the desired optimum efficiency. Compared to the linear amplifier only, the system efficiency is improved significantly. The optimum band separation results in a system efficiency improvement from 58% to 65% in comparison to the case in which the switching amplifier provides only the DC portion of the output signal.

In Section 4, two alternative modulation methods for improved wideband efficient amplification are discussed. Section 4.1 provides a design-oriented parameter-space characterization of asynchronous sigma-delta wideband signal modulation for buck switching power converters, in comparison to conventional PWM. First, the performance characterization in terms of tracking error as a function of modulation depths for adaptive-frequency asynchronous sigma-delta compared to constant switching-frequency conventional PWM has been addressed, revealing enhanced performance for the former. Afterwards, the signal reconstruction error has been obtained for different OSRs. Finally, the effect of using a high-order reactive lowpass filter in the buck converter as an improved signal reconstruction method has been presented. As a conclusion, the complete design space exploration has yielded that low-OSR high-order buck converters are candidates of interest to address the challenge of wideband high-efficiency switching amplifiers, by providing lower tracking errors concurrently with limiting the ratio of average switching frequency vs modulating frequency so as to bound switching losses. In Section 4.2, a technique to provide compensation of filter effects has been proposed. The technique is intended

for application in the digital signal generator and it can be extended to closed loop operation. It allows to decrease filter cut-off frequency in PWM switching amplifiers, decreasing switching frequency requirements.

**Acknowledgments** Part of the work presented in this chapter has been supported by industrial sponsors of the Colorado Power Electronics Center (CoPEC). Partial funding by projects TEC2004-05608-C02-01/TEC2007-67988-C02-01 from the Spanish MCYT and EU FEDER funds is acknowledged.

## References

1. Eduard Alarcón, Dragan Maksimović, "Powering ICs: trends in on-chip power regulators", invited paper, Indian Institute of Technology, Kharagpur, India, December 23, 2005.
2. B. Sahu and G. A. Rincon-Mora, "A high-efficiency linear RF power amplifier with a power-tracking dynamically adaptive buck-boost supply," *Microwave Theory and Techniques, IEEE Transactions on*, vol. 52, pp. 112, 2004.
3. Vahid Yousefzadeh, Eduard Alarcón and Dragan Maksimović, "Three-level buck converter for envelope tracking applications", *IEEE Power Electronics Letters*, March 2006.
4. Vahid Yousefzadeh, Eduard Alarcón, Dragan Maksimović, "Efficiency optimization in linear-assisted switching power converters for envelope tracking in RF Power amplifiers", *IEEE International Symposium on Circuits and Systems*, Kobe, Japan, May 23–26, 2005.
5. Itoh, T., Haddad, G. and Harvey, J., *RF Technologies for Low Power Wireless Communications*, John Wiley and Sons, Inc., 2001.
6. Yousefzadeh V., Wang, N., Popovic, Z., and Maksimović, D. "Digitally Controlled DC-DC Converter for RF Power Amplifier," *IEEE Applied Power Electronics Conference*, February 2004.
7. Sevic J., "Statistical characterization of RF power amplifier efficiency for IS-95 CDMA digital wireless communication systems," *Proceedings 2nd Annual Wireless Communication Conference*, 1997.
8. G. Hanington, P.F. Chen, P. M. Asbeck and L.E. Larson, "High-efficiency power amplifier using dynamic power-supply voltage for CDMA applications", *IEEE Transactions on Microwave theory and techniques*, vol. 47, n° 8, pp. 1471–1476, August 1999.
9. Kazimierczuk, M. "Collector amplitude modulation of the Class E tuned power amplifier", *IEEE Transactions on Circuits and Systems*, vol CAS-31, n°6, June 1984, pp. 543–549.
10. Su, D.K., McFarland, W.J., "An IC for linearizing RF power amplifiers using envelope elimination and restoration", *IEEE Journal of Solid-State Circuits*, Volume: 33, Dec. 1998 pp. 2252–2258.
11. Nagle, P., Burton, P., Heaney, E., and McGrath, F., 'A Wide-Band Linear Amplitude Modulator for Polar Transmitters Based on the Concept of Interleaving Delta Modulation', *IEEE Journal of Solid-state circuits*, vol. 37, No. 12, December 2002.
12. Sahu, B., Rincon-Mora, G.A., "System-level requirements of DC-DC converters for dynamic power supplies of power amplifiers", *IEEE Asia-Pacific Conference, ASIC 2002*, Aug. 2002 Pages:149–152.
13. Soto, A., Oliver, J.A., Cobos, J.A., Cezon, J., and Arevalo, F., "Power supply for a radio transmitter with modulated supply voltage", *IEEE Applied Power Electronics Conference, APEC '04*, Volume: 1, Feb. 2004 pp. 392–398.
14. Marco, L., "Effects of switching power converter nonidealities in EER technique for implementation of polar RF PAs", MSc thesis, UPC, Barcelona, Spain, Oct 2004.
15. Raab, F.H., "Intermodulation distortion in Kahn-technique transmitters", *IEEE Trans. on Microwave Theory and Techniques*, Volume: 44, Issue: 12, December 1996, pp. 2273–2278.

16. Dietmar Rudolph "Out-of-band emission of digital transmissions using Kahn EER technique", IEEE Trans. on Microwave Theory and Techniques, Volume: 50, Issue: 8, August 2002, pp. 1979–1983.
17. Li, Y., Maksimović, D. "High efficiency wide bandwidth power supplies for GSM and EDGE RF power amplifiers," IEEE International Symposium on Circuits and Systems, ISCAS 2005, pp. 1314–1317, Vol. 2.
18. Raab, F.H., Asbeck, P., Cripps, S., Kenington, P.B., Popovic, Z.B., Pothecary, N., Sevic, J.F., and Sokal, N.O., "Power amplifiers and transmitters for RF and microwave," IEEE Trans. on Microwave Theory and Techniques, Volume: 50, Issue: 3, March 2002, pp. 814–826.
19. Mann, S.; Beach, M.; Warr, P.; McGeehan, J.; "Increasing the talk-time of mobile radios with efficient linear transmitter architectures," Electronics & Communication Engineering Journal, Volume: 13, Issue: 2, April 2001, pp. 65–76.
20. F. Wang; A. H. Yang, D.F. Kimball, L. E. Larson and P. M. Asbeck, "Design of wide-bandwidth envelope-tracking power amplifiers for OFDM applications", IEEE Transactions on Microwave Theory and Techniques, Vol. 53, Issue 4, April 2005 pp. 1244–1255
21. Midya, P., "Linear switcher combination with novel feedback", IEEE Power Electronics Specialists Conference, PESC. 2000, pp. 1425–1429, Vol. 3.
22. Feipeng Wang; Ojo, A.; Kimball, D.; Asbeck, P.; Larson, L., "Envelope tracking power amplifier with pre-distortion linearization for WLAN 802.11g", IEEE Microwave Symposium Digest, 2004 IEEE MTT-S, 6–11 June, pp. 1543–1546 Vol.3.
23. Raab, F.H., "Split-band modulator for Kahn-technique transmitters", Microwave Symposium Digest, 2004 IEEE MTT-S, 6–11 June 2004 pp. 887–890 Vol.2.
24. H. Ertl, J. W. Kolar, and F. C. Zach, "Basic Considerations and Topologies of Switched-Mode Assisted Linear Power Amplifiers", IEEE Transactions on Industrial Electronics, Vol. 44, NO. 1, February 1997.
25. Nam-Sung Jung; Nam-In Kim; Gyu-Hyeong Cho, "A new high-efficiency and super-fidelity analog audio amplifier with the aid of digital switching amplifier: class K amplifier", IEEE Power Electronics Specialists Conference, PESC 98, 17–22 May, pp:457–463 Vol.1.
26. Van der Zee, R.A.R.; van Tuijl, E.A.J.M., "A power-efficient audio amplifier combining switching and linear techniques", Solid-State Circuits, IEEE Journal of Volume 34, Issue 7, July 1999, pp: 985–991.
27. Ginart, A.E.; Bass, R.M.; Leach, W.M., Jr., "High efficiency class AD audio amplifier for a wide range of input signals", IEEE Industry Applications Conference, IAS 1999, 3–7 Oct., pp:1845–1850 Vol.3.
28. Marco, L.; Poveda, A.; Alarcon, E.; Maksimović, D.; "Bandwidth limits in PWM switching amplifiers", IEEE International Symposium on Circuits and Systems ISCAS2006, Kos Island, Greece, May 21–24, 2006.
29. Ali, I.; Griffith, R., "A fast response, programmable PA regulator subsystem for dual mode CDMA/AMPS handsets", IEEE Radio Frequency Integrated Circuits, RFIC 2000, 11–13 June pp:231–234.
30. Trescases, O.; Wai Tung Ng, "Variable output, soft-switching DC/DC converter for VLSI dynamic voltage scaling power supply applications", IEEE Power Electronics Specialists Conference, PESC 2004, 20–25 June, pp:4149–4155, Vol.6.
31. Engel Roza, "Analog-to-Digital Conversion via Duty-Cycle Modulation", IEEE Trans. on Circuits and Systems,-II: Analog and Digital Signal Processing, vol. 44, no 11. November 1997.
32. Aurel A. Lazar and László T. Tóth, "Perfect Recovery and Sensitivity Analysis of Time Encoded Bandlimited Signals", IEEE Trans. on Circuits and Systems,-I: Regular Papers, vol. 51, n° 10, pp 2060–2073. October 2004.

# Design Methodology and Circuit Techniques for Any-Load Stable LDOs with Instant Load Regulation and Low Noise

Vadim Ivanov

**Abstract** Application of the structural methodology to the LDO design creates a new class of circuits: any load stable, with instant transient response, large power supply rejection and low noise. Presented are examples of the embedded in SoC LDOs for the SRAM unit (5 ns reaction time on the load steps), radio transmitter (shaping the required noise vs. frequency curve) and for memory retention in the shutdown state (300 nA quiescent current). These LDOs can operate with or without off-chip load capacitors; they are robust to the process and temperature variations and portable to any CMOS process.

## 1 Introduction

The large drain-source and gate leakage of the core transistors in CMOS processes with a minimum gate length of 90 nm and below create a severe on-chip power management problem. Complicated powering schemes have been developed, implying multiple power domains on the system-on-silicon (SoC) chip. These domains may include DSP core(s), few banks of SRAM, analog units like GSM or Bluetooth radio, and audio units. Chip is powered by the one or two DCDC converters, which are followed by numerous LDOs [1]. LDOs for digital domains require keeping the output voltage within the error window against instant load switching from zero to maximum current (and back). LDOs for analog units may require low noise and large PSRR. Some of these LDOs may control the body biasing inside the power domain. The body biasing demands the bidirectional output current capability. Presence of the 15–20 LDOs in the SoC becomes a common practice. Clearly, use of the external load capacitor for each of these LDOs is prohibitively expensive. We have to learn how to design LDOs while employing the on-chip load capacitors (100 pF to few nF) only.

---

V. Ivanov (✉)  
Texas Instruments, Inc., Tucson, AZ 85706, USA  
e-mail: ivanov\_vadim@ti.com

**Fig. 1** The standard LDO structure



An LDO with the traditional structure, shown in Fig. 1, comprises the error amplifier connected to the gate of the large PMOS pass device [2]. There are two main problems of this structure for SoC applications.

1. For any compensation scheme, a combination of load current and load capacitance exists when such LDO is unstable, which results in the requirement of some minimum (or maximum) load capacitance and/or its ESR. Uncertain stability leads to the dedicated compensation in each application, multiplying design efforts in the SoC power management.
2. If the quiescent current ( $I_q$ ) is limited below 10% of the maximum load current, reaction time on the load step is in the  $\mu\text{s}$  range. When the on-chip load capacitor only is used, this would be too long for controlling the output voltage transients within acceptable (<10%) error window.

Recently, the dual-loop LDO structures, allowing an improvement of the load step reaction time, have been shown [3–5]. A figure of merit for LDO dynamics has been suggested in [4], which combined the maximum load current  $I_{L\max}$ , load capacitance  $C_L$ , dynamic error  $\delta V_{out}$  and quiescent current  $I_q$ :

$$FOM = C_L \delta V_{out} I_q / I_{L\max}^2 \quad (1)$$

While having dynamic advantages, these dual-loop LDOs are still not fast enough to provide the satisfactory load regulation without support of the external capacitor.

The structural design methodology [6] has been applied to the embedded LDO development. The result is a set of circuits which can operate with or without an external load capacitor, have extremely fast reaction time on the load changes and exhibit low noise and large power supply rejection. These “any load” LDOs can sink and source current to the load, operate in sleep and active modes, and they are robust to the process and temperature variations. As external capacitors are not obligatory, it enables sprinkling of multiple LDOs in the SoC complicated power management structure without extra cost or die area taxation.

Basics of the structural design methodology are presented in the paragraph II, followed by circuit examples and techniques to improve the particular parameters of

interest: quiescent current, transient response, PSRR at different frequencies, worst-case phase margin or dumping factor, and noise. Examples include LDO for the memory retention with low  $I_q$ (paragraph 3), LDO for the Bluetooth transmitter with strict noise requirements (paragraph 5), and LDO for SRAM bank designed for the fast load step response while having low  $I_q$ (paragraph 6). Stability verification problems in the multiloop system are considered in the paragraph 7.

## 2 Basics of the Structural Design Methodology

There are 18,000 different amplifiers that can be created from just 2 transistors – even before the parametric variations (this number is derived from a multiple of options: NMOS/PMOS, common gate/source/drain, 4 kinds of feedback for each transistor, and for the amplifier as a whole). With the typical analog circuit containing more than a 100 transistors, the number of variants is greater than the number of atoms in the galaxy – and only a few can solve a designer's problem. As a result, most of analog designers are using a cookbook approach, creating a new circuit from the existing one with the fewest changes possible. Radically new solutions are rare, and they are considered to be the major intellectual property by designers and their employers. A way of new circuit invention is needed. The structural methodology is such a procedure: how to find a set of acceptable for application solutions and how to weed out bad or inferior circuits instantly. It has a long success record in the design of operational amplifiers [6], references, power amplifiers, DCDC converters, and now LDOs. By following described below steps, a designer can find a set of satisfactory solutions, some of which are known and some are new. Then designer can finally choose circuit based on the personal preference and secondary parameters of importance.

### 2.1 Graphic Presentation of the System

The first step in the circuit design should be a presentation of the problem to be solved in a graphic form. The graphical presentation is much more informative and easier for comprehension than any text description or set of equations. The most common language for such presentation is a structural diagram. Another option is the signal flow graph, which has the advantages of existing formal rules for equivalent transformations [7] and drawing simplicity. Almost forgotten, but preferred by founders of the control theory, like Mason or Bode, the signal flow graphs have recently started to gain popularity [8].

An example of signal flow graphs of the differential stage is given in Fig. 2. The differential stage can be presented in the simple form of a single  $g_m$  link or in more details as illustrated in Fig. 2c. Graph in Fig. 2c includes the transconductance of each transistor and a common-mode feedback. The graph in Fig. 2c is called the “general structure with common-mode feedback”. Properties of this graph can be



**Fig. 2** Signal flow graphs of the differential stage

extrapolated to any multi-loop, multi-dimensional structure (or multidimensional structure can be equivalently transformed to this graph), just as complex numbers represent properties of the n-dimensional space.

An analysis of the differential structure with common-mode feedback [6, supplement A] is instrumental in the design of circuits with multiple input/output variables, such as class AB stages or multiple output DCDC converters. It also helps in the single-glance estimation and selection of the circuit within the set of possible options.

## 2.2 Dedicated Feedback Control for Each Important Parameter

The next step in the circuit design is a transformation of system structure to the form where every important variable is controlled by a dedicated feedback loop. Circuits without such feedbacks should be weeded out without any further consideration. The advantage of the system where all significant parameters are controlled is obvious; however, the main obstacle to the universal application of this rule is the problem of stability in the resulting multiloop structure.

Although not necessary, but sufficient, condition for the whole system's stability is the stability in each and every loop within this system [9]. A feedback loop can be unconditionally stable (with any load and signal source impedance) if its open-loop transfer function has only one pole. Consequently, the easy way to ensure system stability is designing each loop with the single-stage (single-pole) amplifiers only.

This restriction immensely simplifies the design process. Although in some cases the exclusive use of the single-stage amplifiers is not possible; here, conventional compensation techniques need to be applied and stability has to be carefully verified.

Standard verification of the stability using the merit of phase margin requires a break in the feedback loop and is not suitable for the multiloop system (which loop

to break?). Method of the multiloop system small-signal stability verification by using AC simulations has been described in [10]. Due to the unavoidable presence of the non-linear effects in the circuit, the small-signal only stability verification is not sufficient. The small- and large-step response transient simulations followed by extraction of the overshoot and dumping factor could be used instead, as discussed in paragraph 7.

## 2.3 Library of Elementary Cells

The next design step is implementation of the system structure with elementary cells. The library of these cells includes circuits described in every textbook on analog design, shown in Fig. 3.

It can be amended by few lesser known cells. In particular, the current-input amplifiers shown in Fig. 4 should be a part of the every designer's arsenal. In the circuit of Fig. 4a,  $M0$  and  $M1$  currents are matched, as well as the currents of  $M2/M3$ . Consequently, input currents do not depend on the common-mode input voltage, so the common-mode input impedance is high. The differential input impedance is small and equals to  $1/g_m$ . Dependence of the output current vs. input voltage (Fig. 4c) is identical to the standard differential stage.

The single-output version of this amplifier is shown in Fig. 4b. In this cell, the current sinking from output is unlimited and output current vs. input voltage curve (Fig. 4d) is non-symmetrical.

Use of the current-input amplifier cell inside the local feedback loops improves the speed of these loops at least five times for any given current budget. It simplifies



**Fig. 3** Elementary cells library



Fig. 4 The current-input amplifier cells

the frequency compensation, allowing replacement of the common-source gain stages in the signal path with the common-gate ones, which have much smaller delay.

The extermination of all or most compensation capacitors becomes possible. For example, an operational amplifier described in [11] comprises more than 25 feedback loops, but the only compensation capacitors on its chip are the two Miller capacitors in the main signal path.

## 2.4 Features of the Good Circuit

With structural methodology, we restrict a set of circuits to be considered to “good” circuits only.

1. Good circuit has a dedicated feedback loop controlling each parameter which is important for the reaching of system goals.
2. Dynamically each local loop and system as a whole are stable and their step response looks like the response of the system with first- or second-order transfer function.
3. Good circuit is robust to the variation of the component parameters, process and temperature.
4. Non-linear effects (start-up, power glitch, input/output overload, etc.) have been considered and necessary clamps/limiters added.
5. For embedded in SoC designs, good circuit should not be sensitive to substrate noise.

Acceptable application solutions can and sometimes do exist outside of the “good” circuit domain. However, after 30 years of experience, these “no good” circuits could never outperform circuits from the chosen set.

Nesting of the feedback loops inside the system has been discussed above, as well as the requirement of stability in the each loop. Requirement of the circuit robustness makes parametric optimization efforts practically useless. If optimum of the goal function is dull, then based on a common sense choice of the parameters is good enough; if this optimum is sharp – then this circuit is not robust and consequently is inadequate.

Designing a circuit for a nominal mode of operation normally occupies no more than 20% of total design time. The rest is taken up in consideration of nonlinear effects and in creation of various protective measures. There is no general way to predict such effects. All we can do is study the application and play multiple “what if?” scenarios.

In SoC design, interaction of different units through substrate and supply should be taken into account from the very beginning. Correction measures can be in the layout and process (unit placement, isolation rings, double-well process, separate supply wiring and wirebonds), in the choice of components not sensitive to substrate noise, in the circuit techniques (differential signal processing), and in the choice of the system architecture.

The problem-solving approach in structural design is close to the one described in [12] and to the modern philosophy called “systems thinking” [13].

### 3 Memory Retention LDO

Design of the LDO for memory retention is one of the simplest examples of the structural methodology use. In this application, the supply voltage of the SoC SRAM bank has to be kept at the value which is lower than necessary for operation but sufficient for preserving of information. The only current consumed by the load is the SRAM bank leakage, which can vary from few nA to tens of  $\mu$ A depending on the temperature and process variations. Accuracy better than 100–150 mV is not required, as well as high speed. When the SRAM built-in bypass capacitance only is present, the LDO load capacitance can be merely 200–1000 pF, which can increase up to few  $\mu$ F if the off-chip load capacitor is utilized. The LDO’s most important parameter is the quiescent current, which should be kept within 200–300 nA.

The voltage follower in Fig. 5a is unconditionally stable with any load as it has only one transistor (and one pole  $CL/g_m$ ) in the feedback loop. Its obvious limitations are: a) output current is limited by  $I_0$ ; b) output impedance is high and equals to  $1/g_m$  of the  $M0/M1$  pair.

One of the ways to remove limitation of the maximum load current is controlling the current through transistor  $M1$  by means of the feedback loop varying the  $I_0$  value (Fig. 5b). In this circuit, a rise in the load current decreases current through  $M1$ . It increases  $I_0$  until  $M1$  current returned to the reference value  $I_1$  set by the feedback



**Fig. 5** The memory retention LDO

loop. Increasing  $I_0$  flows through  $M0$  and mirrors by the  $M2/M3$  to the load. This second feedback loop is nested inside the main one of the regulator (around  $M1$ ). As the main loop is stable, then stability of this additional loop is sufficient for the overall system stability with any load.

One of the possible implementations of this structure is shown in Fig. 5c [14], where the reference, amplifier and actuator of the feedback loop consist of  $M4/M5/M6$  and  $I_1$ . The output current in this circuit is limited only by the size and current capability of  $M6-M0-M2/M3$ . In order to increase the maximum output current, as well as improve output impedance and efficiency at high load currents, the  $M2/M3$  current mirror is non-symmetric (which is implemented with resistor  $R1$ ).

Both feedback loops in the circuit of Fig. 5c have only one gain stage (single pole). Such single-pole system is stable with any load and any biasing above the transistor leakage levels. The main drawback of this circuit is the high output impedance, approximately equaled to  $1/g_{m1}$ , limiting the accuracy.

The LDO of Fig. 5c was implemented in TSMC 0.18 process with  $100\text{ nA}$   $I_1$  and load current within  $10\text{ uA}$ . The maximum error of this LDO, consuming at no-load condition less than  $300\text{ nA}$ , is about  $100\text{ mV}$ .

## 4 The Basic Multiloop LDO Structure

The output impedance of the circuit in Fig. 5a can be improved by the larger gain in the main feedback loop around  $M1$ . It can be done by the follower between  $M1$  drain and output, as shown by  $M4$  in Fig. 6a.

This circuit is not yet a regulator, as it needs current  $I_1$  for functionality, but can be used for the stability estimation. In addition to the pole defined by the  $M4$ 's output impedance and load capacitance ( $CL/g_{m4}$ ), this circuit has a second pole defined by the  $g_{m0}$  of  $M0/M1$  pair and parasitic capacitance at the gate of  $M4$  ( $C_{p4}/g_{m0}$ ). The worst-case stability occurs when these poles are equal. If this happens, system, in theory, is at the edge of oscillations with zero phase margin. In practice, it may oscillate due to the presence of small poles. Nevertheless, a 2-stage system can be



**Fig. 6** Steps to the any load stable high current LDO

unconditionally stable with any capacitive load. It can be achieved by limiting the voltage gain of the first stage [15]. The direct dependence between the first stage gain and worst-case phase margin for any capacitance load has been shown in [16].

The first stage ( $M_0/M_1$ ) voltage gain is equal to  $A = g_{m0}R_{p4}$ , where  $R_{p4}$  is an equivalent resistance at the gate of  $MP4$ . Thus, by controlling this resistance, any load stability of the circuit in Fig. 6a can be achieved. This gain control can be done with a real resistor, or, alternatively, the parasitic resistance at the  $M_4$  gate can be decreased by choosing a shorter channel length of  $M_0/M_1$ , as well as of  $M_2/M_3$ .

Biassing the  $M_0/M_1/M_2/M_3$  amplifier from the  $V_{out}$  is another useful feature of the circuit in Fig. 6a. It ensures that DC PSRR of the regulator is virtually unlimited.

If the voltage gain in the  $M_0/M_1$  stage is small, the overall gain of the regulator may not be sufficient for acceptable load regulation. It is certainly a case if load current is  $\sim 10,000 \times$  larger than  $I_0$ . Additional gain, in accordance with structural design principles, should be achieved by nesting of the stages and boosting the gain of an existing, already stable, amplifier, instead of the cascading gain stages in series.

One of the ways to boost the output conductance of  $M_4$  and improve load regulation of the LDO is shown in Fig. 6b. It is done by an additional feedback loop, where the drain current of  $M_4$  is being compared to  $I_4$ , and the difference is amplified by  $M_5$ . This new loop also has two poles, where the first one is defined by the  $g_{m4}$  and parasitic capacitance at the gate of  $M_5$  ( $C_{p5}/g_{m4}$ ), and the second is equal to  $CL/g_{m5}$ . Using the same approach, unconditional stability in this loop can be achieved by decrease in the voltage gain of  $M_4$ , which is equal to  $g_{m4}R_{p5}$ . The equivalent resistance  $R_{p5}$  at the gate of  $M_5$  can be controlled either by shortening of the  $M_4$  channel length or by the partial or full replacement of  $I_4$  with physical resistor ( $I_4 = V_{gs5}/R$ ).

In order to avoid any additional poles in the  $M_4/M_5$  feedback loop, transistor  $M_4$  should operate as a common-gate device, requiring capacitor  $C_0$  at its gate. Without such a capacitor, the large high-frequency impedance at the  $M_4$  gate in

combination with the  $M4$  parasitic gate-source capacitance will add delay and compromise stability in the  $M4-M5$  loop. The  $C_0$  value should exceed the parasitic gate-source and gate-drain capacitances of  $M4$  (it looks like, but it is not a parallel compensation capacitor!).

As a result, both feedback loops in the circuit are unconditionally stable ensuring the overall system stability with any load.

Finally, in order to achieve the pull-up LDO capability, the new gain link is added in parallel to the  $M5$ . This link consists of the cascoding device  $M6$ , current source  $I_5$  and power transistor  $M7$  (Fig. 7). In the same way, stability in this new loop ( $M4-M6-M7$ ) is achieved by the low resistance of the current source  $I_5$  (which can be done by implementation of  $I_5$  with resistor). The value of the voltage source  $V_0$  defines the shoot-through current of  $M5/M7$  in no-load condition.

The LDO of Fig. 7 [17] can sink and source current to the load, it has exceptional bandwidth for any given process and quiescent current, and it is stable with any load capacitance. This circuit is the base for application-specific variations described below.



Fig. 7 LDO for radio units

## 5 Low-Noise LDO for Radio Units

The LDO for the radio unit does not require instant current switching. Its most important parameter is the noise curve as the supply noise directly affects the quality of radio transmission and reception. Different components affect noise at different

frequencies and the noise curve of circuit in Fig. 7 can be shaped to requirements by means of the parameter choice.

At the low frequency ( $f_1 < g_{m0}/2\pi C_0$ ) the noise is dominated by the  $M0-M3$  amplifier. This is primarily the flicker noise, defined almost exclusively by the size of input devices  $M0/M1$  and independent on the biasing  $I_0$ . In the test LDO  $I_0$  has been set to 1  $\mu$ A. Part of the noise originated in the current mirror devices ( $M2/M3$ ) can be decreased to negligible level by the use of source degeneration resistors.

At the frequencies between  $g_{m0}/2\pi C_0$  and LDO unity-gain frequency, noise is defined by  $M4$  and  $I_4$ . In the presence of load capacitor, the unity-gain frequency is equal to  $UGF = A g_{m4}/2\pi CL$ , where  $A$  is the gain in the  $M4-M5$  loop.  $M4$  should be sufficiently large to have a flicker noise corner below  $f_1$  and operate in the weak inversion. Transistors in the current mirror which form  $I_4$  should be heavily degenerated with resistors, or  $I_4$  should be implemented with a resistor. Then, the main source of noise at medium frequencies is the  $M4$  source impedance  $1/g_{m4}$ . In other words, the  $M4$  high frequency noise is inverse proportional to  $\sqrt{I_4}$  and  $I_4$  should be as large as possible within the current budget. In the test LDO the current  $I_4$  has been set to 900  $\mu$ A.

At the frequency above UGF the LDO noise is filtered and defined by the  $CL$ .

An example of shaped by the component parameters noise curve is shown in Fig. 8 (thick line – goal, medium – silicon results and thin – simulations).

The step response time constant of  $M0-M4$  feedback loop, because of very small  $I_0$  and large  $C_0$ , is in the order of 20–50  $\mu$ s, as shown in Fig. 9. Due to the very fast reaction in  $M4-M5-M7$  loops, the load is regulated within 30 mV error window



Fig. 8 Shaped noise curve of the LDO



**Fig. 9** LDO zero to 30 mA load step response, CL = 100 pF

even with 100 pF load capacitance only. Time constants in these loops, due to the large  $I_4$  and the presence of parasitic capacitances only, are in the order of 1–2 ns. The step response is symmetrical for zero to 30 mA and back load current variation, because this LDO can both sink and source current to the output.

Other parameters of the test LDO made in 65 nm Texas Instruments process are in the table below.

|                              |                        |
|------------------------------|------------------------|
| Quiescent current            | 1 mA                   |
| Maximum $IL$                 | 30 mA                  |
| Load capacitance             | > 100 pF               |
| PSRR at 400 kHz<br>at 20 MHz | 25 dB<br>21 dB         |
| Die area                     | 15,000 $\mu\text{m}^2$ |

## 6 LDO for Digital Units

The current consumption of the digital unit, such as SRAM bank, can change in the fraction of nanosecond from zero to maximum (50 to 100 mA), or back.

To avoid missing codes, the power management circuit has to keep the supply voltage of this unit within an error window (50–80 mV from target) even with such a steep current variations. The supply voltage of the unit is provided by the LDO and is supported by the built into every gate bypass capacitance (500–2000 pF total).

To solve this problem, LDO should have no more than 4–6 ns reaction time on the load changes. Due to the wirebond and board bus inductance, the off-chip capacitor is almost worthless for this task, because of the 100 mV or larger transient voltage spike across 5–10 nH of the total (bus + wirebond + board) inductance between load and capacitor.

Sufficiently fast load regulation has been achieved in the two-loop LDO having a large current consumption ( $\sim 10\%$  of the maximum load current) [4]. As the digital unit can be enabled for long periods of time, such consumption is prohibitive for battery powered devices, and should be below 50–100  $\mu\text{A}$ .

When the load current in the circuit of Fig. 7 is switching from high to low,  $V_{\text{out}}$  is increasing, causing a rise in the current through  $M4$ , followed by an increase in  $V_{gs5}$ . Transistor  $M5$  starts to run excessive current until  $M7$  gate potential is decreased by the current  $I_5$ . As current through  $M4$  is practically unlimited, LDO has fast (3–5 ns) reaction time on the load switching from high to low.

The critical drawback of the circuit in Fig. 7 is a delay in reaction on the instant load step from zero to high current. The gate of the large pass device  $M7$  has to be charged to the new larger  $V_{gs7}$  and the only available current for that is  $I_4$ . As shown in Fig. 9, this is not a problem if  $I_4$  is large. If the total current budget is only 50  $\mu\text{A}$  and  $I_4$  is small, step reaction delay becomes too large.

A new feedback loop ( $M8-M10/M11-M9-M5$ , Fig. 10) is added to improve the reaction time when load is switching from low to high current [18]. While load is constant, current of  $M8$  is matching  $M4$ , so  $I_{M4} = I_{M8} = I_{M10} = I_{M11} \sim I_6$  (some part of  $I_6$  flows through  $M12/R3$ ). This loop has multiple gain stages and needs compensation provided by the gain attenuation by  $M12$ .  $R3$  decreases this attenuation when the current through  $M8-M10/M11$  becomes small.



**Fig. 10** LDO for digital units



Fig. 11 LDO response on the load step

When the load current increases,  $V_{out}$  dips and  $M8$  shuts down, as well as  $M10/M11$ .  $I_6$  turns  $M9$  on; thus,  $I_4$  increases speeding up the charging of the  $M7$  gate capacitance. As a result, LDO reaction time on the load steps becomes as small as 4–5 ns in both directions.

Simulated step response (0–50 mA) of the test LDO is shown in the top part of Fig. 11, and scope picture is below it. As shown, no dynamic error has been found in the silicon – probably, 5 ns/50 mV pulses are too fast to be detected by the available lab equipment.

Other parameters of the test LDO manufactured in the 65 nm Texas Instruments process are in the table below.

|                            |                             |
|----------------------------|-----------------------------|
| Quiescent current          | 80 $\mu$ A                  |
| Maximum $IL$               | 50 mA                       |
| Step settling within 10 mV | < 200 ns                    |
| Load capacitance           | > 200 pF                    |
| PSRR at 1 MHz              | 25 dB                       |
| Die area                   | 14,000 $\mu$ m <sup>2</sup> |

The value of the figure of merit (1) for this LDO is 0.003, which is 10 times better than 0.032 achieved in the best published circuit, and 50–100 times better than other LDOs of standard structure of Fig. 1 [4].



**Fig. 12** LDO with class AB step response boosting

Efficiency of the step response boosting loop in Fig. 10 depends on the absolute value of  $R_3$ . If  $R_3$  is too low, then  $I_6$  can not significantly boost the current through  $M9$  during transient. If  $R_3$  is too large, the gain in the  $M8 - M9$  link may become too large and may cause instability. Sensitivity to the  $R_3$  value can be eliminated by the use of a class AB amplifier with low gain for a small input signal and large gain for the greater input. Such circuit is shown in Fig. 12 [19].

If the load current is constant, current through  $M8$  is large, as well as the matching current in  $M4$ :  $I_{M8} = I_{M4} = I_{M10} = I_{M11}$ . Voltage drop across  $R4/R5$  is equal to the  $M12$   $V_{ds}$ . The value of  $R4/R5$  is chosen to create a 100–150 mV voltage drop, keeping  $M12$  out of the triodiing region of operation. At this mode, the gate potential of  $M10/M11/M13$  is high and  $M13$  operates as a cascoding device for  $M12$ . Transistor  $M12$  operates as a diode in parallel with gate-source of  $M9$ , decreasing its gain. Current  $I_6$  is splitted between  $M11$  and  $M12$ , as in the previous circuit in Fig. 11.

If the load current steps up,  $V_{out}$  drops, and  $I_{M8}$  decreases; current through  $M10/M11$  and voltages across  $R5/R6$  decrease as well. The gate potential of  $M13$  drops, forcing down the  $M12$  drain potential. When  $M12$  starts to operate in the triode region, its current decreases.  $I_6$  turns  $M9$  on, increasing  $I_4$  and allowing fast charging at the gate of transistor  $M7$ . As a result, the reaction time on load step of the LDO with  $I_6$  of only 5–10  $\mu\text{A}$  and the total consumption of 40–50  $\mu\text{A}$  is as small as 4–5 ns. The circuit now is robust and can operate in wide variation of the component parameters and biasing currents.

The current capability of  $M5$  should be sufficient to absorb a full load current during its transient from large to small. However, because of the speed and stability considerations, transistor  $M5$  should have a low gate capacitance and, consequently, be small. To merge these requirements, the additional gain link  $R6-M14$  can be added as shown in Fig. 13.



**Fig. 13** LDOs with increased output current

In both circuits in Fig. 13a and b, transistor  $M_{14}$  is off if the load current is stable. If large pull-down current is required, current through  $M_5$  increases the voltage across  $R_6$ , turning on the  $M_{14}$ . As the current of  $M_5$  still flows from the  $V_{out}$ , this additional gain does not create a stability problem. The circuits in Fig. 13 illustrate method of the loop gain buildup using the stage nesting instead of cascading. It avoids all the compensation problems caused by the series connection of gain stages.

## 7 Verification of the Stability with CAD Tools

When circuit solution is chosen and seems to be operational, the next design step is verification of its robustness, including stability, over the operating temperature range and process variations. When designing the stand-alone LDO, extensive

silicon measurements could be done with various loads, temperatures and production lots. When designing the embedded in SoC unit, such measurements are not always feasible. The quantity of the SoC test chips may be limited, and through measurements may not be possible due to test environment constraints. Providing that good component models are available, insufficiency of the silicon measurements can be compensated by extensive simulations. In these simulations tens of possible environmental combinations have to be checked; therefore, an automated procedure is desirable.

By definition, a system is stable if the transient processes in it are settling over time after any input signal. Traditionally, the robustness of the feedback system stability is estimated by the merit of phase margin. The value of phase margin can easily be extracted from AC simulation results, and a statistical tool can be used after multiple simulation runs. Phase margin estimates the small-signal stability only and can not predict any large-signal, conditional stability effects. Phase margin is not applicable for the multiloop system design, being the open-loop parameter. Another, less obvious, limitation is that phase margin is informative for minimal-phase systems only; and the transistor in the common-source connection already is not a minimal phase unit. As a result, the small phase margin numbers below 40–50° can be either too pessimistic or too optimistic when used as estimation of the system settling behavior.

Bode plots are easy to understand and are instrumental in stability conception and compensation of the single-loop system. AC simulations take much less computation resources, which in the past was another reason for their wide use in design. Today, in abundance of very fast and distributed computation, we do not have to limit ourselves to AC simulations or even use it as a main tool. The role of AC simulations is more for pleasing the tradition and peace of mind of some individuals.

As was mentioned before, each loop in the good system should dynamically behave as a first- or second-order system. That seems to be a limitation, but, from a closer look, it almost always is not. High-order systems in theory can provide faster and more accurate transient process; in real design we normally use non-linear cells to ensure the required large-signal behavior, or add feedforward links which decrease the system order without damaging the mid-frequency behavior [6, Chapter 4].

The transient simulations parameters indicating the system stability are the step response overshoot factor  $M = (y_{max} - y_{set})/y_{set}$ , where  $y_{max}$  and  $y_{set}$  are the maximum and settled output values, and dumping factor  $Q = A_n/(A_{n-1} - A_n)$ , where  $A_n$  is the settling curve peak amplitude during the n-th period of settling. For the minimal-phase, linear and second-order single loop system these parameters have a direct relationship with the phase margin. The second-order system with the transfer function  $A(s) = \frac{A_0\omega_n}{s^2 + 2\xi\omega_n s + \omega_n^2}$  has a dumping factor of  $Q = 1/2\xi$  and a phase margin  $\Theta = \tan^{-1} \frac{2\xi}{\sqrt{4\xi^4 + 1 - 2\xi^2}}$ . For example, a phase margin of 43° is equivalent

to the  $Q = 1.25, 52^\circ$  – to  $Q = 1, 65^\circ$  – to  $Q = 0.7$ , and  $11^\circ$  – to  $Q = 5$ .

The dumping factor is much more convenient for stability estimation than phase margin. It can be extracted from both small and large step transient simulations, covering cases with conditional stability. Dumping factor is valid for the non-minimal



**Fig. 14** Step settling in the two-loop system

phase systems. Extraction of the dumping factor is suitable for stability estimations in the multiloop systems as it does not require breaking a loop.

For the small signal step, the overshoot factor is equivalent to the dumping factor for  $n = 1$ . However, for a large step, overshoot factor helps to discover the conditional stability effects caused by the components nonlinearity, cut-off or saturation.

In the multiloop system transient responses of different loops overlap each other, and step settling may appear as shown in the Fig. 14. Using traditional brute-force approach to the compensation of such system, we would have to extract the transfer function of the full system and then set the poles and zeros with the root stability methods. With an increasing number of loops and uncertainty in the component and load parameters, this approach to compensation becomes prohibitively complicated very soon.

According to structural design methodology, we limit the transfer function of each loop to the second order by design. In the circuits with wide load and environment variations, like LDOs, we can make these first- or second-order loops unconditionally stable (for example, using the approach described in [15]). According to the theorem in [9], the stability in the every loop is not necessary but is a sufficient condition for the overall system stability.



**Fig. 15** Using the Fourier transform for tone detection

Verification of the stability in the nominal case can be done by the step transient simulations, while observing settling processes at the output and key nodes of the each loop in the system.

Statistical CAD verification of the stability with numerous process and temperature variations can be done with the step transient simulations as well. Using the Fourier transform of the step response, the tone frequencies can be extracted (Fig. 15). When tone frequency is known, the consequent dumping factor can be calculated for each of the tones.

Using Texas Instruments ACS tool inside TISpice simulator, this procedure has been automated and employed in the multiple embedded LDO designs.

## 8 Conclusions

The structural design methodology can be the base for break-through in all areas of analog, and is especially efficient in power management and instrumentation IC design. This methodology deserves to be studied by every analog designer. LDOs developed with structural methodology surpass standard designs in each and every parameter, often by the order of magnitude. There hardly exists an excuse to design a standard structure LDO any more.

**Acknowledgments** This work was inspired by Keith Kunz and could not happen without prying, support and feedback from Somshubhra Paul, Sachin Rao Bandigadi, Mangina Prasadu and other designers from the Texas Instruments embedded power management group.

## References

1. S. Rusu “Power reduction and management techniques for digital circuits,” Short course on Embedded Power Management for IC Designers, ISSCC2008.
2. G.A. Rincon-Mora and P.E. Allen “Optimized frequency shaping: circuit topologies for LDO,” IEEE TCAS II, v. 45, #6, pp. 703–708, June 1998.
3. W. Oh and B. Bakkaloglu “A CMOS low-dropout regulator with current-mode feedback amplifier,” IEEE TCAS-II, v. 54, # 10, October 2007.
4. P. Hazucha, T. Karnik, B.A. Bloechel, and C. Parsons “Area-efficient linear regulator with ultra fast load regulation,” IEEE JSSC, v. 40, # 4, pp. 933–940, April 2005.
5. M. Al-Shyukh, H. Lee and R. Perez “A transient-enhanced low-quiescent current low-dropout regulator with buffer impedance attenuation,” IEEE JSSC, v. 42, # 8, Aug. 2007.
6. V. Ivanov and I. Filanovsky “Operational amplifier design using structural design methodology,” Kluwer, 2004.
7. S. Mason “Feedback theory – further properties of the signal flow graphs,” Proc. IRE, v. 44, #7, pp. 920–926, 1956.
8. H-P. Shmid “Circuit transposition using signal-flow graphs,” Proc. ISCAS-2002, v. 2, pp. 25–28.
9. E. Попов “Теория линейных систем автоматического (E. Popov, “Linear system control theory”), Moscow, Nauka, 1988, in Russian.
10. M. Milev, R. Burt “Tool and methodology for AC-stability analysis of continuous-time closed-loop systems,” Proceedings of DATE-2005.

11. V. Ivanov, I. Filanovsky "A 110 dB PSRR/CMRR/gain CMOS micropower operational amplifier", ISCAS-2005.
12. G. Polya "How to solve it," Princeton university press, 1971.
13. J. O'Connor, I. McDermott "The art of systems thinking," Thorsons, 1997.
14. J. Gerber, V. Ivanov "Ultra low power class AB LDO circuit for varying load currents," EU patent application #10 2007 041 155.5, 2007.
15. R. Reay, G. Kovacs "An unconditionally stable 2-stage CMOS amplifier," IEEE JSSC, v.30, #5, pp. 591–594, May 1995.
16. J. Hu, J.H. Huijsing, K.A.A. Makinwa "A Three-Stage Amplifier with Quenched Multipath Frequency Compensation for All Capacitive Loads," ISCAS-2007 Proceedings, pp. 225–228.
17. V. Ivanov, D. Spady "Low-voltage class AB output stage with minimal delay," US patent #6,930,551, 2005.
18. V. Ivanov, K. Kunz "Any load stable low-drop voltage regulator with instant load regulation," US patent application 12/008,533, 2008.
19. V. Ivanov, K. Kunz, M. Prasadu "Variable gain current input amplifier," US patent application 12/128,147, 2008.