

REDEFINING SPEED

# 65 nm HIGH SPEED SERIAL LINK TRANSCIEVER



**Supervised by**  
**Prof. Dr. Ahmed Khairy Aboul Seoud**  
**Dr. Sameh Ibrahim**

# **Analog IC design of high speed serial link transceiver for 10 GbaseKR standard using a 65 nm CMOS process**

## **Acknowledgments**

We would like to thank all of those whom have helped us in our graduation project work to get out with this knowledge and results presented in this book. Our supervisors, Prof. Dr. Ahmed Khairy Abo El-Seoud from Alexandria university who always tried to make it easier for us, Dr.Sameh Assem Ibrahim from Ain Shams university who supported us technically and gave us from his time and knowledge along the year, Dr.Tamer Aly from UCLA university , Eng. Ahmed El-Shenawy from Si-ware company for his first push and support to start working in the analog design field, the analog IC design graduation project team of the previous year for them being helpful whenever needed and for them encouraging us. Thank you for our parents for them standing beside us and supporting us for all of this year.

# Index

|                                                       | Page |
|-------------------------------------------------------|------|
| • Sector (1) : Transmitter                            |      |
| - Chapter 1 : Introduction                            | 4    |
| - Chapter 2 : Multiplexer                             | 7    |
| - Chapter 3 : Equalization                            | 24   |
| - Chapter 4 : Digital to analog converter             | 27   |
| - Chapter 5 : Drivers' topologies                     | 36   |
| - Chapter 6 : Predrivers                              | 47   |
| - Chapter 7 : Flipflops                               | 48   |
| - Chapter 8 : Transmitter's simulation results        | 50   |
| • Sector (2) : Receiver                               |      |
| - Chapter 1 : Continuous Time Linear Equalizer        | 59   |
| - Chapter 2 : Variable Gain Amplifier                 | 68   |
| - Chapter 3 : Analog to Digital Conversion            | 73   |
| - Chapter 4 : High Speed Flash ADC                    | 78   |
| - Chapter 5 : Behavioral Modeling using Matlab Models | 97   |
| • Sector (3) : PLL & CDR                              |      |
| - Chapter 1 : Phase locked loop                       | 109  |
| - Chapter 2 : Clock and data recovery                 | 167  |

**Sector (1)**

**Transmitter In 10G Ethernet base-  
KR standard**

## **Contents:**

### **1.1 Introduction.**

- 1.1.1 High-Speed Link Overview
- 1.1.2 Electrical Link Circuits
- 1.1.3 Transmitter**

### **1.2 Multiplexer.**

- 1.2.1 Multiplexer Architectures
- 1.2.2 Shift-register architecture
- 1.2.3 Different topologies of 2-to-1 MUX
- 1.2.4 Flip-Flop Circuit
- 1.2.5 Latch Circuit
- 1.2.6 Circuit used in the design

### **1.3 Equalization.**

- 1.3.1 Equalization coefficients

### **1.4 Digital To Analog Converter.**

- 1.4.1 Characteristics of DAC
- 1.4.2 Thermometer Code
- 1.4.3 Current steering DAC
- 1.4.4 Basic bias cell
- 1.4.5 Simulation of one cell DAC some switches on and the other off
- 1.4.6 Monte Carlo simulation

### **1.5 Drivers' Topologies.**

- 1.5.1 A Voltage-Mode Driver
- 1.5.2 Current-Mode Drivers
- 1.5.3 Basic Differential Pair
- 1.5.4 Current mode logic Driver

## **1.6 Predrivers.**

## **1.7 Flipflops.**

### **1.7.1 Ultra High-Speed Latch Design**

## **1.8 Simulation Results.**

### **1.8.1 Specification that must meet**

### **1.8.2 Simulation results**

# Chapter (1)

## Introduction

### 1.1.1 High-Speed Link Overview

High-speed point-to-point electrical link systems employ specialized I/O circuitry that performs incident wave signaling over carefully designed controlled-impedance channels in order to achieve high data rates. As will be described later in this section, the electrical channel's frequency-dependent loss and impedance discontinuities become major limiters in data rate scaling. This section begins by describing the three major link circuit components: (1) the transmitter, (2) the receiver, and (3) the timing system. Next, the section discusses the electrical channel properties that impact the transmitted signal. The section concludes by providing an overview of key technology and system performance metrics.

### 1.1.2 Electrical Link Circuits

Figure 1.1.1 shows the major components of a typical high-speed electrical link system. Due to the limited number of high-speed I/O pins in chip packages and PCB wiring constraints, a high-bandwidth transmitter serializes parallel input data for transmission. Differential low swing signaling is commonly employed for common-mode noise rejection and reduced crosstalk due to the inherent signal current return path.<sup>7</sup> At the receiver, the incoming signal is sampled, regenerated to CMOS values, and de-serialized. The high-frequency clocks that synchronize the data transfer onto the channel are generated a frequency synthesis *phase-locked loop* (PLL) at the transmitter, while at the receiver the sampling clocks are aligned to the incoming data stream by a *timing recovery system*.



Figure 1.1.1

### 1.1.3 Transmitter

The transmitter must generate an accurate voltage swing on the channel while also maintaining proper output impedance in order to attenuate any channel-induced reflections. Either current- or voltage-mode drivers, shown in Fig. 1.1.2, are suitable output stages. Current-mode drivers typically steer current close to 20 mA between the differential channel lines in order to launch a bipolar voltage swing on the order of  $\pm 500$  mV. Driver output impedance is maintained through termination, which is in parallel with the high-impedance current switch. While current-mode drivers are most commonly implemented,<sup>8</sup> the power associated with the required output voltage for proper transistor output impedance and the “wasted” current in the parallel termination led designers to consider drivers use a regulated output stage to supply a fixed output swing on the channel through a series termination that is feedback controlled.<sup>9</sup> While the feedback impedance control is not as simple as parallel termination, the voltage mode drivers have the potential to supply an equal receiver voltage swing at a quarter<sup>10</sup> of the common 20 mA cost of current-mode drivers.



Figure 1.1.2

### 1.1.4 Transmitter Circuit

A transmitter circuit encodes a symbol into a current or voltage for transmission over a line. A good transmitter has output impedance optimized for the selected transmission mode, rise-time control to avoid injecting energy into tank circuits or reflecting it off of line discontinuities, and clean signal levels without offset errors. Different transmission modes require.

Different transmitter output impedances. Voltage-mode transmitters have very low output impedance and hence are implemented using large FETs operated in the resistive region. At the opposite extreme, current-mode transmitters have very high output impedance and are implemented using FETs operated in saturation. Intermediate between these two extremes are self-series terminated transmitters that have an output impedance matched to the line and are implemented with moderate-sized FETs operated in the resistive region.



Figure 1.1.3

## Chapter (2)

### Multiplexer

In electronics, a multiplexer (or mux) is a device that selects one of several analog or digital input signals and forwards the selected input into a single line. A multiplexer of  $2^n$  inputs has  $n$  select lines, which are used to select which input line to send to the output. Multiplexers are mainly used to increase the amount of data that can be sent over the network within a certain amount of time and bandwidth. A multiplexer is also called a data selector. An electronic multiplexer makes it possible for several signals to share one device or resource, for example one A/D converter or one communication line, instead of having one device per input signal. In recent years, serialized data transmission is widely adopted in modern communication systems and I/O devices. Multiplexers (MUX) are applied to convert low-speed parallel data into a high speed serial datum. Apart from the required high operating frequency, low power and area have also become important features for embedded systems.

High-speed links can be limited by both the channel and the circuits. Clock generation and distribution is key circuit bandwidth bottleneck. Multiplexing circuitry also limits maximum data rate.

#### 1.2.1 Multiplexer Architectures

There is several architecture of multiplexer that can be used; these architectures will be showed in the following lines:

With 2\_1 MUX, we can now study architectures for a greater number of input channels. Most Multiplexers are based one of two topologies: the tree structure and the shift register configuration.

##### Tree structure:

The tree structure is the natural extension of the 2\_1 Mux, Illustrated in the following figure 1.2.1



Figure 1.2.1

The idea is to group the input channels in pairs and multiplex each pair, reducing the number by a factor of two after each rank. In this example, Rank 1 is driven by CK/4, Rank 2 by CK/2 and FF out by CK.

The divider delay issue applies to the entire tree structure .while lowering the frequency by a factor of two; each divider must drive twice as many devices as the preceding divider in the chain. Consequently, if all ranks incorporate identical 2\_1 MUXES, then the nth dividers produces an output frequency of  $F_{CK}/2^n$  , but experiences of total capacitance of  $2^n CM$  , where CM denotes the capacitance introduced by one complete 2\_1 MUX. In other words, all of the divider stages must satisfy roughly equal speed and delay requirements.

Anther difficulty in this architecture is the exponential growth of the hard ware and power dissipation with the number of the input channels. For example 16\_1 MUX tree architecture requires 15 2\_to\_1 MUXES and 4 dividers. If all of the building blocks employ current steering, such a tree includes roughly 100 current sources.

To remedy the issues related to the power dissipation and the divider loading, the 2\_to\_1 MUXES used in the lower ranks can be scaled can be scaled down in speed ,power, and devices dimensions , in the above figure , For example the building blocks in rank1 can be scaled by a factor of two with respect to those in rank 2.

The tree architecture is the most commonly-used MUX topology. The modular nature and the ability to run at high speeds have made configurations is the dominant choice in transmitter design.

The tree architecture is divided into several types according to state of the clock:

#### **1.2.1.1 Full rate architecture**

- Usually static CMOS is used in the early stages but CML logic sometimes used in last stages in order to minimize CML to save power as figure 1.2.2
- Full-rate architecture relaxes clock duty-cycle, but limits max data rate.
- Needs to generate and distribute high-speed clock.
- Needs to design high speed flip-flop.
- Max clock frequency that can be efficiently distributed is limited by clock buffers ability to propagate narrow pulses
- So need a faster clock use faster clock buffers usually by CML.
- CML with inductive peaking.



Figure 1.2.2



Figure 1.2.3

- To increase data rates eliminate final retiming and use multiple phases of a slower clock to mux data.

### 1.2.1.2 Half rate architecture

Half-rate architecture (Figure 1.2.3) eliminates high-speed clock and flip-flop and use multiple phases of a slower clock to mux data.

Output eye is sensitive to clock duty cycle.

Critical path no longer has flip-flop setup time.

Half rate architecture uses 2 clock phases separated by  $180^\circ$  to mux data  $180^\circ$  phase spacing (duty cycle) critical for uniform output eye.

### 1.2.1.3 Increasing Multiplexing Factor – $\frac{1}{4}$ Rate

Increase multiplexing factor to allow for lower frequency clock distribution.

$\frac{1}{4}$  - rate architecture 4- phase clock distribution spaced at  $90^\circ$ .

$90^\circ$  phase spacing and duty cycle critical for uniform output eye.

Higher fan-in MUXES run slower due to increased cap at mux node.

### 1.2.2 Shift-register architecture

The shift-register architecture is based on viewing the multiplexing operation as a parallel to serial conversion. As shown in the following figure 1.2.4



Figure 1.2.4

A shift-register uses  $CK/N$  to load  $D_1-D_m$  in parallel once every  $NTCK$  second. The main clock,  $CK$ , then allows the stored data to be shifted from  $FF_1$  to  $FF_m$ , thereby generating the serial sequence at  $D_{out}$ .

Owing to its linear growth with the number of channels, the shift register architecture is more compact and less power-hungry than the tree architecture. For example if  $m=16$ , then the MUX requires 16 flip flops and four  $\div 2$  dividers , a total of 20 flip flops and hence 40 current sources. However, the high speed clock,  $CK$ , must now drive all of the data flip flops in addition to the first divider, i.e., 17 flip flops in this example. More importantly, the divider delay proves much more critical here than in the free topology. The delay of the overall chain must be small with respect to  $NT CK$  in the tree structure whereas it must be less than  $T CK$  in the register architecture. In the above example ( $m=16$ ) , the chain of four  $\div 2$  circuits may introduce a delay even exceeding  $T CK$  , thus limiting the utility of the architecture for high speed operation.

A variant of the shift-register architecture incorporates a multiphase multiplexer. As illustrated in the following figure 1.2.5(a) the idea is to sense multiple channels by means of

differential pairs, activating each pair by non-overlapping pulses of width  $T_{CK}$ . As a result,  $V_{out}$  is equal to  $D_1$  for  $t_1 < t < t_2$ , equal to  $D_2$  for  $t_2 < t < t_3$  etc. The non-overlapping pulses can be generated by a ring counter that shifts a single



Figure 1.2.5

logic ONE each clock cycle [fig 1.2.5 (b)]. The overall  $N$ -to-1 MUX must therefore be configured as shown in the fig 1.2.5 (c). Here the data is loaded into the register by  $CK/N$  and subsequently multiplexed by non-overlapping pulses. [The switches represent the differential pairs in fig 1.2.5(a).]

While simplifying the design of the register, the multiphase architecture of fig 1.2.5 (c) still requires precise alignment between  $CK$  and  $CK/N$ . Furthermore, as, with the cml circuit, it displays glitches at the output depicted. this effect arises when the last pulse in  $v_4$  and the pulse immediately after it in  $v_1$  multiplex  $D_m$  and  $D_1$  , respectively .Since both data channels may contains edges during clock transition , the MUX output momentarily falls.

The architecture of fig 10.22(c) requires short .non overlapping pulses in the voltage domain, necessitating high slew rates for high speed operation. It is therefore preferable to generate and apply the pulses in the current domain .We surmise that if various phases of a clock are ANDED, the output currents of the AND gates can yielded the required pulses. Note that the input channels must be retimed by proper clock phases to avoid glitches at the output.

In practical, a combination of the above architectures can be used. For example, the low speed ranks may employ a shift register topology and the high speed ranks tree architecture.

### 1.2.3 Different topologies of 2-to-1 MUX

#### 1.2.3.1 2-to-1 CML MUX

We begin our study by first considering a 2-to-1 MUX. Such a circuit must accommodate two random binary inputs, routing each to the output according to the logical level of the “select” command in the [fig 1.2.6 (a)]. Shown in the fig 1.2.6 (b) is an implementation example where differential pairs M1-M2 and M3-M4 senses the inputs D1 and D2 respectively, converts the signals to current, and apply the result to the load resistors. The select command determines whether ISS flows through M1-M2 or M3-M4, enabling one data path and disabling the other .illustrated in fig 1.2.6 (c), VOUT therefore assume a new value after each transition of select.



Figure 1.2.6

Since the terminology used for MUXES may appear somewhat confusing, we give an example here. If the multiplexer in the fig 1.2.6 designed for 10 GB/S, then each of the input streams has a rate of 5 GB/S and the output , DOUT , exhibits the specific rate of 10 GB/S . Furthermore the select command is in fact a 5-GHZ clock signal that routes D1 to VOUT for one half of cycle (100ps) and D2 to VOUT for the other half. We here after select by clock (CLK).

The MUX configuration in the fig 1.2.6 (b) is among the fastest circuits in a given technology as it employs current steering and contains no feedback loop requiring settling. However the circuit demands certain timing relationships among D1, D2, and the clock. To understand this issue, let us redraw the waveforms of fig 1.2.6 (c) more realistically, including finite rise and fall time but still assuming perfect edge alignment between D1, D2 and CLK. As depicted in fig 1.2.7, at  $t = t_1$ , both D1 and D2 change levels but VOUT must remain equal to D1 immediately before  $t_1$  and equal to D2 immediately after. Unfortunately, since both D1 and D2 cross zero at  $t = t_1$ , the tail current in fig 1.2.6(b) is divided equally between the load resistors regardless of the value of the clock. As a result, VOUT suffers from large glitches.



Figure 1.2.7

The above analysis suggests that one of the input streams must be offset with respect to the other in the same domain so to avoid simultaneously transitions in both inputs in the MUX. The optimum value of the offset is half a clock period, obtained by introducing a latch in one of data paths (Fig. 1.2.8). Note that, when CK is high, the latch “opaque.” Holding  $D1\Delta$  constant. When CK is low, the latch “transparent,” but the pair M1\_M2 is disabled. If D1 and D2 are aligned with CK, then the edges of  $D1\Delta$  are shifted by  $TCK/2$  and, at each clock transition, only one of the signals sensed by M1\_M2 and M3\_M4 may change.



Figure 1.2.8

While avoiding simultaneous transitions at the two MUX inputs, the topology of Fig. 1.2.8 nonetheless requires that  $D_1$  and  $D_2$  be aligned with  $CK$ . If, for example,  $D_2$  arrives at an arbitrary time, the MUX output still experiences glitches or pulse width distortion. Thus, if the stages preceding this circuit do not guarantee such an alignment,  $D_1$  and  $D_2$  must be retimed by means of flip-flops. depicts the resulting configuration in single-ended form. Here, when  $CK$  is high, latches  $L_1$ ,  $L_3$ , and  $L_4$  are opaque and the MUX selects  $D_{1\Delta}$ . During the time,  $L_2$  senses the output of  $L_3$  and  $L_5$  senses the output of  $L_4$ . When  $CK$  goes low,  $D_{1\Delta}$  begins to track the output of  $L_2$  whereas  $L_5$  becomes opaque, providing a stable level for the MUX. Depending on the environment, it is possible to eliminate  $L_1$  and  $L_4$  saving the consumption (1).

A critical issue in the architecture of Fig 1.2.9 arises from the delay of the  $\div 2$  circuit.



Figure 1.2.9

Figure 1.2.10 illustrates the waveforms from zero and finite delay of the  $\div 2$  divider. In Figure 1.2.10 (a), the falling edges of the full rate clock coincide with the midpoint of

each bit of V1, optimally retiming the data. In figure 1.2.10 (b), on the other hand, CK/2 and hence the inputs and the output of the MUX are delayed by  $\Delta T$  with respect to clock. As a result the sampling point of FFOUT is shifted to the left of the data eye by  $\Delta T$ , therefor reducing the time that the master latch in FFOUT has for sensing its input. This effect severely degrades the setup and hold margins of FFOUT limiting the speed.



Figure 1.2.10

Another difficulty in the topology in the fig 1.2.9 relates to the load seen by the  $\div 2$  divider. Here the divider must drive seven differential pairs (the master latch within the divider and the MUX and L1-L5), experiencing large load capacitance. This capacitance limits the maximum speed of the divider considerably. The circuit is therefore usually followed by a buffer but the total delay resulting from the both the divider and the buffer may be quite large, intensifying the effect illustrated in fig 1.2.10 (b).

The architecture of figure 1.2.9 entails two other sources of delay that must consider carefully. The reader can show that the delay from the clock input of the 2-to-1 MUX to its output exacerbates the above skew whereas the delay from the clock input of FFOUT to its output alleviates the problem. In fact, inserting a proper delay in series with this flip flop's input can in principle, cancel the divider delay.

### 1.2.3.2 CMOS Multiplexer:

The increasing demand for low-power very large scale integration (VLSI) can be addressed at different design levels, such as the architectural, circuit, layout, and the process technology level. At the circuit design level, considerable potential for power savings exists by means of proper choice of a logic style for implementing Combinational circuits. This is because all the important parameters governing power dissipation, switching Capacitance, transition activity, and short-circuit currents are strongly influenced by the chosen logic style. Depending on the application, the kind of circuit to be implemented, and the design technique used, different Performance aspects become important, disallowing the formulation of universal rules for optimal logic styles. This paper analyzes 2-to-1 multiplexer using complementary CMOS, and pass-transistor logic styles. These implementations are compared based on transistor count, power dissipation, and delay pass-transistor logic styles. The power consumption, delay, area, transistor count of various logic styles is compared. A circuit that generates an output that exactly reflects state of one of a number of data inputs, based on value of One or more control inputs is called

“multiplexer”. A multiplexer with two data inputs is referred as “2-to-1 Or 2:1” multiplexer. Commonly used circuit and graphical symbol for 2:1 multiplexer is represented in Fig. 1.2.11



Figure 1.2.11

Logic expression for multiplexer output is given in equation 1.  $F = \bar{S} X_1 + S X_2$  (1) A logic style is the way how a logic function is constructed from a set of transistors. It influences the speed, size, and power dissipation and wiring complexity of a circuit. All these characteristics may vary considerably from one logic style to another and thus make the proper choice of logic style crucial for circuit performance.

#### 1.2.3.2.1 Complementary CMOS Logic style:

Any logic function in complementary CMOS is realized by NMOS pull-down and PMOS pull-up networks connected between gate, output and power lines. Input signals are connected to transistor gates only. Pseudo NMOS and Cascade Voltage Switch Logic (CVSL) fall under CMOS rationed logic family.



Figure 1.2.12

#### 1.2.3.2.1 Pass-transistor Logic style:

The pass-transistor logic reduces the number of transistors required to implement logic by allowing the primary inputs to drive gate terminals as well as source-drain terminals. The advantage is that one pass-transistor network (either NMOS or PMOS) is sufficient to perform the logic operation. Several pass-transistor logic styles Complementary Pass Transistor Logic (CPL), Energy Economized Pass Transistor Logic (EEPL), Differential Cascode Voltage Switch logic with Pass gate (DCVSPG), Swing restored pass-transistor logic (SRPL), Double pass-transistor logic (DPL), CMOS Transmission gate, Push-pull

Pass Transistor Logic (PPL), LEAP Integrated pass gate logic (LEAP) and 2T Multiplexer are considered to implement 2-to-1 multiplexer.

Among all these 2T Multiplexer is optimal. It uses one pMOS and one nMOS transistor and these two-pass transistors at the input select which signal to propagate. The logic levels will be deteriorated by the pass transistor. The threshold voltage of both pass-transistors should be identical for accurate operation. represents implementation of 2-to-1 multiplexer using several logic styles.



Figure 1.2.13  
Comparison between these types of CMOS figure 1.2.13

## 1.2.4 Flip-Flop Circuit

### 1.2.4.1 H. True Single-Phase-Clock Flip-Flop (TSPC)

Conventional latches require both true and complementary clock signals. The True Single-Phase-Clock (TSPC) circuit technique uses only one clock signal that is never inverted and fits both static and dynamic CMOS circuits. The topology is shown in Fig 1.2.14 with reference to. On the falling edge of the clock the latch holds its previous state with help of the transistors P1 and N4. On the rising edge of the clock the D input is sampled through the transistors P1, N1,N3 and N4.



Figure 1.2.14

### 1.2.4.2 Clocked CMOS Flip-Flop (C2CMOS) – A Clock-Skew Insensitive Approach

An ingenious positive edge-triggered register that is based on a master-slave concept insensitive to clock overlap is shown in Fig 1.2.15 proposed in. This circuit is called the C2CMOS (Clocked CMOS) flip-flop which operates in two phases: when  $\text{clk}=0$ , the first driver is turned on, and the master stage acts as an inverter sampling the inverted version of D on the internal node X. The master stage is in the evaluation mode. When  $\text{clk}=1$ , the master stage section is in hold mode, while the second section evaluates. The previous value stored is propagated to the output node through the slave stage, which acts as an inverter.



Figure 1.2.15

---

## 1.2.5 Latch Circuit

### 1.2.5.1 Static latches

Static memories use positive feedback to create a *bi-stable circuit* — a circuit having two stable states that represent 0 and 1. The basic idea is shown in Figure 1.2.16a, which shows two inverters connected in cascade along with a voltage-transfer characteristic typical of such a circuit. Also plotted are the VTCs of the first inverter, that is,  $V_o1$  versus  $V_i1$ , and the second inverter ( $V_o2$  versus  $V_o1$ ). The latter plot is rotated to accentuate that  $V_i2 = V_o1$ .

Assume now that the output of the second inverter  $V_o2$  is connected to the input of the first  $V_i1$ , as shown by the dotted lines in Figure 1.2.16a. The resulting circuit has only three possible operation points (A, B, and C), as demonstrated on the combined VTC. The following important conjecture is easily proven to be valid: Under the condition that the gain of the inverter in the transient region is larger than 1, only A and B are stable operation points, and C is a metastable operation point.



Figure 1.2.16

Suppose that the cross-coupled inverter pair is biased at point  $C$ . A small deviation from this bias point, possibly caused by noise, is amplified and *regenerated* around the circuit loop. This is a consequence of the gain around the loop being larger than 1. The effect is demonstrated in Figure 1.2.16a. A small deviation  $\delta$  is applied to  $V_{i1}$  (biased in  $C$ ). This deviation is amplified by the gain of the inverter. The enlarged divergence is applied to the second inverter and amplified once more. The bias point moves away from  $C$  until one of the operation points  $A$  or  $B$  is reached. In conclusion,  $C$  is an unstable operation point.

Every deviation (even the smallest one) causes the operation point to run away from its original bias. The chance is indeed very small that the cross-coupled inverter pair is biased at  $C$  and stays there. Operation points with this property are termed *metastable*.



Figure 1.2.17

On the other hand,  $A$  and  $B$  are stable operation points, as demonstrated in Figure 1.2.17b. In these points, the **loop gain is much smaller than unity**. Even a rather large deviation from the operation point is reduced in size and disappears. Hence the cross-coupling of two inverters results in a *bistable* circuit, that is, a circuit with two stable states, each corresponding to a logic state. The circuit serves as a memory, storing either a 1 or a 0 (corresponding to positions  $A$  and  $B$ ). In order to change the stored value, we must be able to bring the circuit from state  $A$  to  $B$  and vice-versa. Since the precondition for stability is that the loop gain  $G$  is smaller than unity, we can achieve this by making  $A$  (or  $B$ ) temporarily unstable by increasing  $G$  to a value larger than 1. This is generally done by applying a trigger pulse at  $V_{i1}$  or  $V_{i2}$ . For instance, assume that the system is in position  $A$  ( $V_{i1} = 0$ ,  $V_{i2} = 1$ ). Forcing  $V_{i1}$  to 1 causes both inverters to be on simultaneously for a short time and the loop gain  $G$  to be larger than 1. The positive feedback regenerates the effect of the trigger pulse, and the circuit moves to the other state ( $B$  in this case). The width of

the trigger pulse need be only a little larger than the total *propagation delay* around the circuit loop, which is twice the average *propagation delay* of the inverters.

In summary, a bi-stable circuit has two stable states. In absence of any triggering, the circuit remains in a single state (assuming that the power supply remains applied to the circuit), and hence remembers a value. A trigger pulse must be applied to change the state of the circuit. Another common name for a bi-stable circuit is *flip-flop* (unfortunately, an *edge-triggered register* is also referred to as a *flip-flop*).

---

## 1.2.6 Circuit used in the design

In the following page is the architecture used in my design which is the 16-to-1MUX based on the 2-to-1 MUX. Therefor it divides into 4 stages, each stage double the date rate in order to reach 10 GB/s Which it is the desired data rate.

In the early stages we used CMOS circuits which refer to it early but in the last stage we use TSPC flip flop and latch which is faster than the CMOS and the cml MUX rather than CMOS MUX which is faster than the CMOS in order to get 10GB/s.

The output will be shown in the following pages of the four stages and their eye diagrams.

### 1.2.6.1 Multiplexer structure



Figure 1.2.18

### 1.2.6.2 Multiplexer simulation outputs

#### Eye diagram of different stages of MUX

##### First stage



##### Second Stage Result



### Third stage result



### Forth stage result



# **Chapter (3)**

## **Equalization Systems**

In order to extend a given channel's maximum data rate, many communication systems use equalization techniques to cancel inter-symbol interference caused by channel distortion. Equalizers are implemented either as linear filters (both discrete and continuous-time) that attempt to flatten the channel frequency response, or as nonlinear filters that directly cancel ISI based on the received data sequence. Depending on system data rate requirements relative to channel bandwidth and the severity of potential noise sources, different combinations of transmit and/or receive equalization are employed.

Equalization can take place in the transmitter, the receiver, or both; each approach has its costs and benefits. Transmit equalization is relatively easy to implement, and it can provide an open eye for the receiver, thus simplifying the receiver design. Typically, the cost of transmit equalization is that it reduces the DC level of the transmitted signal, resulting in a smaller received eye at the receiver. Although the transmit level can be increased to enlarge the eye, the peak transmit level is limited by process, supply voltage, power budget, and crosstalk considerations. For very loss channels, the equalized eye can become so small that it is not easily sliced at the receiver.

A second issue with transmit equalization is that it is difficult to make it adaptive. Optimal performance requires that the equalizer settings react to changes in channel characteristics due to real-time variations in operating conditions such as temperature. The transmitter cannot inherently “see” the eye it is delivering to the receiver; thus, an adaptive transmit equalizer requires a side-channel or back-channel to convey eye quality information from the receiver back to the transmitter [10]. Since there is currently no standard for such a backchannel, using an adaptive transmitter would limit the core's interoperability.

Receive equalization also has benefits and costs. One benefit of receive equalization is that it is easy to make it adaptive since the eye that needs to be optimized and the equalizer are physically in the same device. A difficulty of receive equalization is that it is hard to make it as precise as transmit equalization. Precision often comes at a cost of considerable power.

Transmit equalization, implemented with a *finite impulse response* (FIR) filter, is the most common technique used in high-speed links.<sup>40</sup> This TX “pre-emphasis” (or more accurately “de-emphasis”) filter, shown in Fig. 1.3.1, attempts to invert the channel distortion that a data bit experiences by pre-distorting or shaping the pulse over several bit times. While this filtering could also be implemented at the receiver, the main advantage of implementing the equalization at the transmitter is that it is generally easier to build

high-speed *digital to- analog converters* (DACs) versus receive-side analog-to-digital converters. However, because the transmitter is limited in the amount of peak power that it can send across the channel due to driver voltage headroom constraints, the net result is that the low-frequency signal content has been attenuated down to the high-frequency level, as shown in Fig. 1.3.1.

Figure 1.3.2 shows a block diagram of receiver-side FIR equalization. A common problem faced by linear receiver-side equalization is that high-frequency noise content and crosstalk are amplified along with the incoming signal. Also challenging is the implementation of the analog delay elements, which are often implemented through time-interleaved sample-and-hold stages<sup>41</sup> or through pure analog delay stages with large area passives.<sup>42,43</sup> Nonetheless, one of the major advantage of receiver-side equalization is that the filter tap coefficients can be adaptively tuned to the specific channel,<sup>41</sup> which is not possible with transmit-side equalization unless a “back-channel” is implemented.<sup>44</sup>



Figure 1.3.1



Figure 1.3.2



Figure 1.3.3

Linear receiver equalization can also be implemented with a continuous-time amplifier. Here, programmable RC-degeneration in the differential amplifier creates a high-pass filter transfer function that compensates the low-pass channel. While this implementation is a simple and low-area solution, one issue is that the amplifier has to supply gain at frequencies close to the full signal data rate. This gain-bandwidth requirement potentially limits the maximum data rate, particularly in time-division de-multiplexing receivers.

The core's transmit equalizer is a standard mixed signal finite impulse response (FIR) structure, often known as pre-emphasis or de-emphasis. Its function is to construct a transmitted output  $Y[n]$  from data  $D[n]$  that implements the equation  $Y[n]=D[n]-\alpha D[n-1]$ .

In Fig. 1.3.3, it can be seen that both the current and previous symbol pulses are shaped by the channel. The equalized response (bottom of the figure) is the difference of the two signals. The result is that a single tap of pre-emphasis can provide ISI cancellation over more than one symbol interval. In this example, the symbol and post-cursor tails perfectly cancel at all subsequent sampling instants. In practice, this cancellation is not perfect and additional pre-emphasis taps can be used to cancel more of the tail or bumps that arise from reflections. However, additional taps increase the transmitter's capacitive loading, require more area and power, and, as mentioned before, reduce the DC level of the signal.

### 1.3.1 Equalization coefficients

|   | C(-1)  | Pre-tap current | C(0)  | Main-tap current | C(1)  | Post-tap current |
|---|--------|-----------------|-------|------------------|-------|------------------|
| 0 | 0      | 0               | 1     | 20mA             | 0     | 0                |
| 1 | -.025  | 500u            | 0.925 | 18.5mA           | -0.05 | 1mA              |
| 2 | -.05   | 1mA             | 0.85  | 17mA             | -0.1  | 2mA              |
| 3 | -0.075 | 1.5mA           | 0.775 | 15.5mA           | -0.15 | 3mA              |
| 4 | -0.1   | 2mA             | 0.7   | 14mA             | -0.2  | 4mA              |
| 5 | -0.125 | 2.5mA           | 0.625 | 12.5mA           | -0.25 | 5mA              |
| 6 | -0.15  | 3mA             | 0.55  | 11mA             | -0.3  | 6mA              |
| 7 | -0.175 | 3.5mA           | 0.475 | 9.5mA            | -0.35 | 7mA              |

Table 1.3.1

Before illustrate the choice of drivers, refer first to Digital to analog converter that responsible on aggressive equalization process in 10G Ethernet base Kr standard.

## Chapter 4: Digital To analog converter

The main function of digital to analog converter is to control Equalization process where when switches open and close in each tap equalization process change. Before illustrate how DAC can change equalization process , referring to operation of Dac and its characteristics is important .

Basic principle and operation of DAC:

DAC is device that produces analog output A that's proportional to digital input D .



Figure 1.4.1

$\alpha$  : Proportionality factor

$\alpha$  sets the dimension and the full scale range of Analog output .

For example if  $\alpha$  is current quantity the analog output can be expressed as follow

$$A = I_{REF} * D$$

And if  $\alpha$  is voltage quantity then the output can be expressed as follow:

$$V_{out} = \frac{D}{2^N} * V_{ref}$$

Where N is number of bits , from the two previous equation DAC can be viewed as reference multiplication or division function.

## 1.4.1 Characteristics of DAC

### 1.4.1.1 Sources of static errors

Static errors, that are those errors that affect the accuracy of the converter when it is converting static (dc) signals, can be completely described by just four terms. These are offset error, gain error, integral nonlinearity and differential nonlinearity. Each can be expressed in LSB units or sometimes as a percentage of the FSR. For example, an error of  $\frac{1}{2}$  LSB for an 8-bit converter corresponds to 0.2% FSR.

#### 1.4.1.1.1 Offset Error

The offset error as shown in Figure 3 is defined as the difference between the nominal and actual offset points. for a DAC it is the step value when the digital input is zero. This error affects all codes by the same amount and can usually be compensated for by a trimming process. If trimming is not possible, this error is referred to as the zero-scale error.



Figure 1.4.2

#### 1.4.1.1.2 Gain Error

The gain error shown in Figure 4 is defined as the difference between the nominal and actual gain points on the transfer function after the offset error has been corrected to zero. for a DAC it is the step value when the digital input is full scale. This error represents a difference in the slope of the actual and ideal transfer functions and as such corresponds to the same percentage error in each step. This error can also usually be adjusted to zero by trimming.



Figure 1.4.3

#### 1.4.1.1.3 Differential Nonlinearity (DNL) Error

The differential nonlinearity error shown in Figure 5 (sometimes seen as simply differential linearity) is the difference between an actual step width (for an ADC) or step height (for a DAC) and the ideal value of 1 LSB. Therefore if the step width or height is exactly 1 LSB, then the differential nonlinearity error is zero. If the DNL exceeds 1 LSB, there is a possibility that the converter can become nonmonotonic. This means that the magnitude of the output gets smaller for an increase in the magnitude of the input.



Figure 1.4.4

#### 1.4.1.1.4 Integral Nonlinearity (INL) Error

The integral nonlinearity error shown in Figure 6 (sometimes seen as simply linearity error) is the deviation of the values on the actual transfer function from a straight line. This straight line can be either a best straight line which is drawn so as to minimize these deviations or it can be a line drawn between the end points of the transfer function once the

gain and offset errors have been nullified. The second method is called end-point linearity and is the usual definition adopted since it can be verified more directly. for the DAC they are measured at each step. The name integral nonlinearity derives from the fact that the summation of the differential nonlinearities from the bottom up to a particular step, determines the value of the integral nonlinearity at that step.



Figure 1.4.5

#### 1.4.1.1.5 Latency

Specification defines the total time from the moment that the input digital word changes to the time the analog output value has settled to within a specified tolerance. Latency should not be confused with settling time, since latency includes the delay required to map the digital word to an analog value plus the settling time.

#### 1.4.1.1.6 Glitch impulse area

The maximum area of any extraneous glitch appears at the output after after the input code changes



Figure 1.4.6

Among these parameters DNL and INL are usually determined by the accuracy of reference multiplication or division .Settling time and delay are functions of output loading and switching speed. Glitch impulse depends on D/A converter architecture.

The linearity of D/A converters strongly depend on the accuracy of reference Multiplication or division employed to generate output Levels. The electrical quantities voltage, current, charge can be multiplied or divided using resistor ladders, current steering arrays, switch capacitor circuit respectively. Here Referring to current steering Dac is our interest and other types is beyond scope of this book . Before going to Current steering DAC a architecture it's useful to know why thermometer codes is the most used codes in implementing current steering DAC.

---

### 1.4.2 Thermometer Code

There are lots of coding schemes in the digital world. Obviously binary is most familiar; we all learned Gray coding in school. And there are much more complex coding schemes like 8B/10B that have to address a variety of concerns like DC drift. With current-steering DACs, switches are used to route current into resistors (or some sensing element that can measure the amount of current flowing). The switches are driven by the digital value representing the number to be converted to analog; the amount of current sensed becomes the analog value. However, if, when changing digital values, some switches open and others close, you will naturally have a mismatch between the “make” time of the closing switches and the “break” time of the opening switches. That means that, for an instant, you might have a combination of switches closed that is just transitional; this ends up creating a glitch at the output as the values resolve and stabilize.

In order to avoid this, “thermometer code” is used. This code is so simplistic it's almost brain-dead. There's simply a bit for each discrete possible value. So, for instance, an 8-bit code can represent the numbers 0-8. 0 would be 00000000; 5 would be 11111000; and 8 would be 11111111. It's called thermometer code because, if you turned it 90 degrees with the LSB at the bottom, then, as the number increased or decreased, the place where the 1s stop would rise and fall just like the mercury in an old thermometer. The real benefit of this is that, when the value changes, the changing switches are either all opening or all closing. There's no mix of opening and closing switches, and so there's no glitching. Of course, the tradeoff is in the number of bits required. It's completely linear; there's no compression whatsoever. File it under “N” for “No free lunch.” But the question now how we can convert the binary codes to thermometer code the answer is easy and illustrated in the following table .

| Binary |   |   | Thermometer    |                |                |                |                |                |                |
|--------|---|---|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| A      | B | C | T <sub>1</sub> | T <sub>2</sub> | T <sub>3</sub> | T <sub>4</sub> | T <sub>5</sub> | T <sub>6</sub> | T <sub>7</sub> |
| 0      | 0 | 0 | 0              | 0              | 0              | 0              | 0              | 0              | 0              |
| 0      | 0 | 1 | 1              | 0              | 0              | 0              | 0              | 0              | 0              |
| 0      | 1 | 0 | 1              | 1              | 0              | 0              | 0              | 0              | 0              |
| 0      | 1 | 1 | 1              | 1              | 0              | 0              | 0              | 0              | 0              |
| 1      | 0 | 0 | 1              | 1              | 1              | 1              | 0              | 0              | 0              |
| 1      | 0 | 1 | 1              | 1              | 1              | 1              | 1              | 0              | 0              |
| 1      | 1 | 0 | 1              | 1              | 1              | 1              | 1              | 1              | 0              |
| 1      | 1 | 1 | 1              | 1              | 1              | 1              | 1              | 1              | 1              |

$T_1 = A + B + C$        $T_5 = A (B + C)$   
 $T_2 = A + B$        $T_6 = AB$   
 $T_3 = A + BC$        $T_7 = ABC$   
 $T_4 = A$

Figure 1.4.7

### 1.4.3 Current steering DAC

The current-steering DAC, which is based on the switched-current technique is suitable for high-speed applications and, therefore, the most frequently used DAC architecture in wideband communication applications. All current sources are equal and controlled by thermometer code so that when digital input increases by 1 LSB , one additional current is switched to output.



Figure 1.4.8

#### 1.4.3.1 Advantage of current steering DAC:-

- 1) Guarantee monotonicity: the transfer characteristics of such arrays is monotonic function of input.
- 2) Used in high speed application.
- 3) Cancel the glitches that lead to large DNL ,since it's controlled by thermometer coding.
- 4)

### 1.4.3.2 Disadvantage of current steering DAC:-

Can't be used for high resolution as it require  $2^N - 1$ . For example, if a 13-bit DAC was designed using these architectures, there would have to be 8,191 current sources resident on the chip, not an insignificant amount.

### 1.4.3.3 Different architectures of Current steering DAC Using CMOS technology

#### 1.4.3.3.1 Sooch Architecture:-

##### Advantage:-

elimination of the possibility of mismatch between the two branch currents and thus, reduce power consumption.

##### Disadvantage:-

The main limitation of the high-swing cascade current mirrors just presented, is that the input voltage is large.



Figure 1.4.9

## 1.4.4 Basic bias cell

### Alternative scheme



Finally, in previous figure, the drain-source voltage of M5 is only used to bias the source of M6. Therefore, M5 and M6 can be collapsed into one diode-connected transistor whose source is grounded. Call this replacement transistor M7. The aspect ratio of M7 should be a factor of four smaller than the aspect ratios of M1-M4 to maintain the bias conditions. In practice, the aspect ratio of M7 is further reduced to bias M1 past the edge of the active region and to overcome a mismatch in the thresholds of M7 and M2 caused by body effect.

## 1.4.5 Simulation of one cell DAC some switches on and the other off

Mismatch in vds is very small , current in branches is approximately equal

## 1.4.6 Monte Carlo simulation

If fabrication process parameter and device mismatch effect on same die are not taken into account then Some design may degrade in performance Overall design yield could be unexpectedly low Hence statistical analysis must find a high place in design cycle be unexpectedly low.

### 1.4.6.1 Monte carlo simulation result for 3 sigma :-



# Chapter (5)

## Drivers' Topologies

### 1.5.1 A Voltage-Mode Driver

Most transmitters require one or more forms of compensation to correct for the effects of variation in process, supply voltage, and temperature. For series terminated transmitters, the output impedance is compensated by segmenting the driver devices for termination resistors. The delay lines used for rise-time control are compensated to give consistent rise time across process and voltage comers. Finally, transmitters that derive their own signal levels, such as most current-mode transmitters, require process compensation to keep signal levels constant across process comers.

Very high-speed signaling systems may employ multiplexing transmitters that multiplex several low-speed data channels on the output line under control of a multiphase clock. By reducing the signal swing in the multiplexing output stage, these transmitters can be made to operate at frequencies above the gain-bandwidth product of the process. Multiplexing transmitters require care in generating and distributing timing signals to avoid pulse distortion, skew, and systematic jitter.

### 1.5.2 Current-Mode Drivers

#### 1.5.2.1 Saturated FET Driver

The simplest current-mode driver is just a single FET operated in the saturation region, as illustrated in Figure 1.5.1(a). When the input signal,  $in$ , is high, the FET is on, with  $V_{GT} = V_{DD} - V_{Th}$  AS long as the line voltage remains above  $V_{GT}$ , the device is in saturation and acts as a current source with output impedance determined by the channel-length modulation coefficient of the device. When the input signal is low, the FET is off, no current flows, and the output impedance is effectively infinite. Thus, the FET acts as a unipolar current-mode driver with the on-current being set by the saturation current of the device. Unfortunately the amount of current sourced by a saturated FET can vary by 2: 1 or more across process, voltage, and temperature, making this simple current mode driver rather inaccurate. This variation can be partly compensated by varying the effective width of the current source device in the same manner that FET resistors are compensated. In this case the current source device is built in a segmented manner, and segments are turned on or off to set the current to the desired value. As with resistor compensation, the process is controlled by a FSM that compares a reference current to the current sourced by a typical

driver. The segment settings for this driver are then distributed to all other drivers on the same chip.



Figure 1.5.1

### 1.5.2.2 Current-Mirror Drivers

A current mirror can also be used to compensate an output driver for process, voltage, and temperature variations, as illustrated in Figure 1.5.1(b and c). In both configurations, a stable reference current,  $I_{ref}$ , is applied to a diode-connected FET of size  $x$  to generate a bias voltage,  $V_{bias}$ . In the first configuration, Figure 1.5.1 (b),  $V_{bias}$  is applied to the gate of a scaled FET, size  $kx$ , to realize a current source with magnitude  $kI_{ref}$ . This current source is then switched by placing a FET in series with the drain controlled by the input,  $in'$ . When  $in'$  is high the driver sinks current  $kI_{ref}$  from the line. When  $in'$  is low, the series device is off and no current flows. When  $in$  switches, the current is set by the V-I characteristics of the upper device, quadratically following the input slope until the current limit of the lower device is reached.

Placing two devices in series requires that the devices be made large and  $V_{bias}$  be made small to provide enough headroom on the current source to allow for the voltage drop across the resistive switching device. The series connection can be eliminated by gating the bias voltage at the current source, as shown in Figure 1.5.1, instead of switching the output current. In this configuration an inverter with  $V_{bias}$  as its supply voltage is used to drive the gate of the current-source device. When input,  $in$ , is low, the gate voltage,  $V_g$ , is driven to  $V_{bias}$ , and the driver sinks a current of  $kI_{ref}$  from the line. When  $in$  is high,  $V_g$  is driven to ground, and no current flows into the driver.

With this input-gated configuration the output device can be made about half the size of each of the two series devices in the output-switched driver without running out of headroom across process corners. This results in a driver with one-quarter the area and one-half the output capacitance of the switched driver.

This small size is not without cost. The gated driver has the disadvantage of a slow transient response compared with the output switched driver. With the gated driver, the gate voltage,  $V_g$ , rises to  $V_{bias}$  exponentially with an RC time constant set by the pre-driver output resistance and the current source gate voltage. Several time constants are required for the output current to settle within a small percentage of  $kI_{ref}$ . With the switched driver, on the other hand, as soon as the gate voltage of the switch device reaches the point where the current source enters saturation, the transient is over, except for the effects of

channel-length modulation. For this reason, switched-current drivers are employed in applications where a sharp transient response is important.

### 1.5.2.3 Differential Current-Steering Driver

Using a source-coupled pair to steer the current from a current source into one of two legs, as illustrated in Figure 1.5.2(a) has several advantages. First, it gives an extremely sharp transient response because, depending on device sizing, the current switches from 0 to  $kI_{ref}$  over about one half volt of input swing. Second, the circuit draws constant current from the supply, reducing the AC component of power supply noise. Finally, the source voltage,  $V_g$ , is stable, reducing the turn-on transient that results with the switched current-source configuration Figure 1.5.1 (b) when the switch device turns on and draws a momentary current surge from the line to charge up the capacitance of its source node.



Figure 1.5.2

To reduce output delay, the current-steering driver should be used with a limited-swing pre-driver, as shown in Figure x.10(b). The pre-driver converts the full-swing differential inputs,  $in$  and  $in'$ , to limited swing gate drive signals,  $g$  and  $g'$ . The loads of the pre-driver are set so that the swing of the gate drive signals is limited to the amount needed to fully switch the differential output pair. This gives minimum output delay, because current starts to switch as soon as  $g$  and  $g'$  begin to swing. There is no *dead band* at the beginning or end of the signal swing.

The gain-bandwidth product of a differential stage is constant. This driver gets its speed by virtue of its *fractional* stage gain (full-swing to limited-swing in the pre-driver; limited swing to very small signal swing in the driver). With gain less than unity, bandwidth (speed) can be very high. In practice, drivers of this type can be made to switch in just a few  $T$

The current-steering driver is naturally suited to drive a balanced differential line, as shown in Figure 1.5.2. The complementary outputs of the driver are attached to the two

conductors of the line. The other end of the line is parallel terminated into a positive termination voltage, VT. The driver provides a balanced AC current drive to the line superimposed on a steady DC current of  $kI_{ref}$ . The termination supply, VT, sees only the DC current.

The differential driver is also effective in driving a single-ended line, as shown in Figure 1.5.2(d). In this case, the complementary, *line'*, outputs of several drivers are tied together and to the signal return. The signal return in turn is AC-shorted to the termination supply. The net result again is that the termination supply sees only DC current because the sum of the signal and return current for each driver is a constant. Owing to the shared signal return, this configuration is susceptible to signal return cross talk.(5.1)

#### 1.5.2.4 Current Mode Logic

The operation of a current-mode driver is similar, except in this case the voltage across the transistor is larger, and the transistor operates in its saturated current region. For a given current (i.e., swing) a current-mode driver will have a smaller device size than a voltage driver, since the voltage driver must source the current at a much smaller  $Z_p5$ . The larger voltage across the device results in higher power dissipation. However, the high impedance of the driver isolates the output signal from ground noise. In addition, system design is made easier by referencing the signal only to the positive supply, because the positive supply can be used as the transmission line signal return. CML driver is mainly differential pair so we talk first about it.[5.2]



Figure 1.5.3

### 1.5.3 Basic Differential Pair

How do we amplify a differential signal? As suggested by the observations in the previous section, we may incorporate two identical single-ended signal paths to process the two phases [Fig. 1.5.3(a)]. Such a circuit indeed offers some of the advantages of differential signaling: high rejection of supply noise, higher output swings, etc. But what happens if  $Vin_1$  and  $Vin_2$  experience a large common-mode disturbance or simply do not have a well-defined common-mode dc level? As the input eM level,  $Vin.eM$ , changes, so do the bias currents of  $M1$  and  $M2$ , thus varying both the trans-conductance  $I$  devices and the output eM level. The variation of the trans-conductance in turn leads to a change in the small-signal gain while the departure of the output eM level from its ideal value lowers the maximum allowable output swings. For example, as shown in Fig. 1.5.3(b), if the input eM level is excessively low, the minimum values of  $Vin_1$  and  $Vin_2$  may in fact turn off  $M1$  and  $M2$ , leading to severe clipping at the output. Thus, it is important that the bias currents of the devices have minimal dependence on the input eM level.

A simple modification can resolve the above issue. Shown in Fig. 1.5.4, the “differential pair”<sup>1</sup> employs a current source  $Iss$  to make  $IDJ + ID2$  independent of  $Vin.eM$ . Thus If  $Vin_1 = Vin_2$  the bias current of each transistor equals  $Iss/2$  and the output common-mode level is  $Vvv - Rvlss/2$ . It is instructive to study the large-signal behavior of the circuit for both differential and common-mode input variations.



Figure 1.5.4

#### 1.5.3.1 Qualitative Analysis

Let us assume that in Fig. 1.5.4,  $Vin_1$   $Vin_2$  varies from  $-00$  to  $+00$ . If  $Vin_1$  is much more negative than  $Vin_2$ ,  $M1$  is off,  $M2$  is on, and  $ID2 = Iss$ . Thus,  $Vout1 = Vdd$  and  $Vout2 = Vdd - Rvlss$ . As  $Vin_1$  is brought closer to  $Vin_2$ ,  $M1$  gradually turns on, drawing a fraction of  $Iss$  from  $RD1$  and hence lowering  $Vout1$ . Since  $IDJ + ID2 = Iss$ , the drain current of  $M2$  decreases and  $Vout2$  rises. As shown in Fig. 1.5.5(a), for  $Vin_1 = Vin_2$  we have  $Vout1 = Vout2 = Vvv - Rvlss/2$ . As  $Vin_1$  becomes more positive than  $Vin_2$ ,  $M1$  carries a greater current than does  $M2$  and  $Vout1$  drops below  $Vout2$ . For sufficiently large  $Vin_1 - Vin_2$ ,  $M1$

“hogs” all of  $I_{SS}$ , turning  $M2$  off. As a result,  $V_{out1} = V_{ov} R_{OSS}$  and  $V_{out2} = V_{ov}$ . Fig. 1.5.5 also plots  $V_{out1} - V_{out2}$  versus  $V_{in1} - V_{in2}$ .



Figure 1.5.5

The foregoing analysis reveals two important attributes of the differential pair. First, the maximum and minimum levels at the output are well-defined ( $V_{oo}$  and  $V_{OD}$   $RDISS$ , respectively) and independent of the input eM 41evel Second, the small-signal gain (the slope of  $V_{out1} - V_{out2}$  versus  $V_{in1} - V_{in2}$ ) is maximum for  $V_{in1} = V_{in2}$ , gradually falling to zero as  $|V_{in1} - V_{in2}|$  increases. In other words, the circuit becomes more nonlinear as the input voltage swing increases. For  $V_{in1} = V_{in2}$ , we say the circuit is in equilibrium.

Now let us consider the common-mode behavior of the circuit. As mentioned earlier, the role of the tail current source is to suppress the effect of input CM level variations on the operation of  $M1$  and  $M2$  and the output level. Does this mean that  $V_{in,CM}$  can assume arbitrarily low or high values? To answer this question, we set  $V_{in1} = V_{in2} = V_{in,CM}$  and vary  $V_{in,CM}$  from 0 to  $V_{DD}$ . Fig. 1.5.6(a) shows the circuit with  $I_{SS}$  implemented by an NFET. Note that the symmetry of the pair requires that  $V_{out1} = V_{out2}$ .<sup>[5.3]</sup>



Figure 1.5.6

What happens if  $V_{in,CM} = 0$ ? Since the gate potential of  $M1$  and  $M2$  is not more positive than their source potential, both devices are off, yielding  $I_{D3} = 0$ . This indicates that  $M3$  is in deep triode region because  $V_b$  is high enough to create an inversion layer in the transistor. With  $IDJ = ID2 = 0$ , the circuit is incapable of signal amplification, and  $V_{out} = V_{out2} = VDD$ .

Now suppose  $V_{in,CM}$  becomes more positive. Modeling  $M3$  by a resistor as in Fig. 4.8(b), we note that  $M1$  and  $M2$  turn on if  $V_{in,CM} \gg V_{TH}$ . Beyond this point,  $IDJ$  and  $ID2$  continue to increase and  $V_p$  also rises [Fig. 4.8(c)]. In a sense,  $M1$  and  $M2$  constitute a source follower, forcing  $V_p$  to track  $V_{in,CM}$ . For a sufficiently high  $V_{in,CM}$ , the drain-source voltage of  $M3$  exceeds  $V_{GS3} - V_{TH3}$ , allowing the device to operate in saturation. The total current through  $M1$  and  $M2$  then remains constant. We conclude that for proper operation,  $V_{in,CM} = V_{GS1} + (V_{GS3} - V_{TH3})$ .

What happens if  $V_{in,CM}$  rises further? Since  $V_{out1}$  and  $V_{out2}$  are relatively constant, we expect that  $M1$  and  $M2$  enter the triode region if  $V_{in,CM} > V_{out1} + V_{TH} = VDD - RDIss/2 + VT$ . This sets an upper limit on the input CM level. In summary, the allowable value of  $V_{in,CM}$  is bounded as follows:

$$V_{GS} + (V_{GS3} - V_{TH3}) \leq V_{in,CM} \leq \min[V_{DD} - R_D(I_{ss}/2) + V_{TH}, V_{DD}]$$

With our understanding of differential and common-mode behavior of the differential pair, we can now answer another important question: How large can the output voltage swings of a differential pair be? As illustrated in Fig. 1.5.7, for  $M1$  and  $M2$  to be saturated, each output can go as high as  $VDD$  but as low as approximately  $V_{in,CM} - V_{TH}$ . In other words, the higher the input CM level, the smaller the allowable output swings. For this reason, it is desirable to choose a relatively low  $V_{in,CM}$ , but the preceding stage may not provide such a level easily.

An interesting trade-off exists in the circuit of Fig. 1.5.7 between the maximum value of  $V_{in,CM}$  and the differential gain. Similar to a simple common-source stage the gain of a differential pair is a function of the dc drop across the load resistors. Thus, if  $RDIss/2$  is large,  $V_{in,CM}$  must remain close to ground potential.



Figure 1.5.7

### 1.5.3.2 MOS Differential Pair

Most of the principles studied in the previous section for the bipolar differential pair apply directly to the MOS counterpart as well. For this reason, our treatment of the MOS circuit in this section is more concise. We continue to assume perfect symmetry.

#### 1.5.3.2.1 Large-Signal Analysis

As with the large-signal analysis of the bipolar pair, our objective here is to derive the input/ output characteristics of the MOS pair as the differential input varies from very negative to very positive values. From Fig. 1.5.8 [5.4]



Figure 1.5.8

$$V_{out} = V_{out1} - V_{out2}$$

$$= -R_D(I_{D1} - I_{D2})$$

To obtain  $I_{D1} - I_{D2}$ , we neglect channel-length modulation and write a KVL around the input network and a KCL at the tail node:

$$V_{in1} - V_{GS1} = V_{in2} - V_{GS2}$$

$$I_{D1} + I_{D2} = I_{ss}$$

Since  $I_D = (1/2)\mu_nC_{ox}(W/L)(V_{GS} - V_{TH})^2$ ,

$$V_{GS} = V_{TH} +$$

## 1.5.4 Current Mode Logic Driver



Figure 1.5.9

Current Mode Logic buffers are based on the differential circuit topology shown in Figure 1.5.9. After empirical examination of Figure 1.5.9, it can be seen that a CML buffer is composed of a resistively loaded differential pair with a simple current mirror providing bias current. The differential pair provides the switching mechanism for the buffer. A simple NMOS current mirror provides the tail current for the differential pair. The tail source device also suppresses the effects of input common-mode voltage variations on the operation of  $M1$  and  $M2$ . The drain resistors of the differential pair are chosen by the designer to set the RC time constant and the differential output voltage best suited for the application.

The designer should also be cognizant of the bias conditions for the circuit in Figure 1.5.9 that yield optimal performance. Recall that as the input common-mode voltage changes so does the bias current (which is the same for  $M1$  and  $M2$ ), which in turn varies the trans-conductance of the devices and the output common-mode level. Variation of the device trans-conductance changes the small-signal gain and deviation of the output common-mode level from its' optimal value will lower the maximum output voltage swing.  $V_{in,CM}$  should be high enough to keep the drain-source voltage of the tail source device  $V_{DS3}$  greater than  $V_{on3} = V_{GS3} - V_{TH3}$  thus allowing it to operate in the saturation region. The input common-mode voltage  $V_{in,CM}$  should be bounded as follows:

$$V_{GS} + (V_{GS3} - V_{TH3}) \leq V_{in,CM} \leq \min[V_{DD} - R_D(I_{ss}/2) + V_{TH}, V_{DD}]$$

This equation sets both minimum and maximum values for the input common-mode voltage. The designer should be cognizant of the fact that as the input common-mode voltage increases, the allowed output voltage swing will decrease. Though the above discussion is more from an amplification point of view, it turns out that for CML buffers  $V_{in,CM}$  can be used to determine when the tail source  $M3$  is in saturation. A CML buffer is operating under optimal conditions when the tail source device is in saturation and the current of differential pair switches fully in response to the input from the previous stage. When designing more complex CML circuits like an edge-triggered flip-flop composed of CML latches, the designer needs to make sure that the tail source device stays in saturation. For example, the tail source device of a CML latch provides current to two cascoded MOS devices during both track and latch modes. Though these circuits are relatively simple, carelessness by the designer could potentially cause the tail source device of a CML latch to triode.  $Ibias$  is generated from bias current distribution circuits that feed each CML buffer stage. These circuits will be discussed later in this report.

It should be noted that for the CML buffer to achieve optimal performance, complete current switching should occur in the differential pair. From Figure 1.5.9, it can also be seen that the maximum output differential voltage swing is a function of the load resistor and the tail current, assuming that complete current switching of the differential pair takes place. As the differential input ( $V_{in1} - V_{in2}$ ) varies from minimum to maximum, each output of the differential pair ( $V_{out1}$  and  $V_{out2}$ ) varies from ( $VDD - RDISS$ ) to  $VDD$ . Thus the differential output varies from  $-RDISS$  to  $RDISS$ . By assigning  $-RDISS$  as a logic 0 and  $RDISS$  as a logic 1 the CML buffer transmits differential logic signals. The output differential voltage swing is a function of only the drain resistor and the tail current of the CML buffer. The output differential voltage swing of the CML buffer is less than that of a CMOS inverter, which makes CML buffers a better choice for high-speed signaling.

Examination of the large signal behavior of the CML buffer will further show the advantages of using the CML buffer topology for high-speed, low power integrated circuits. If the input common-mode voltage  $V_{in,CM}$  is bounded as specified in equation 4, and there is a small voltage difference between  $V_{in1}$  and  $V_{in2}$ , then there is a correlative change in the differential output current ( $ID1 - ID2$ ) as given by [4]:

$$(ID_1 - ID_2) = \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_{in1} - V_{in2}) \sqrt{\left( \frac{4I_{SS}}{\mu_n C_{ox} \frac{W}{L}} \right) - (V_{in1} - V_{in2})^2} \quad (5)$$

From this equation it can be seen that the differential output current ( $ID1 - ID2$ ) is an odd function of the differential input voltage ( $V_{in1} - V_{in2}$ ). When the differential pair is balanced with ( $V_{in1} = V_{in2}$ ) the differential output current ( $ID1 - ID2$ ) is zero.

As the input differential voltage increases, one transistor in the differential pair will carry larger fractions of the tail current  $ISS$  until the differential input exceeds  $DVin,max$  of the MOS devices of the differential pair. Once this condition is met, then one transistor will carry all of the current of from the tail source device  $ISS$ . Since the CML buffer is less sensitive to common-mode noise, and common-mode noise frequently appears in high-speed circuits, the CML buffer is a better choice for driving outputs of circuits of this kind. Another important advantage of CML buffers over the single-ended

CMOS inverter is that a non-inverting buffer can be realized with one CML buffer instead of two CMOS inverters. Being able to realize a non-inverting buffer with one CML buffer helps reduce propagation delay.

The designer must be mindful to operate the differential pair of the CML buffer under full switching conditions. This is done by making sure that the differential input voltage (i.e., the differential output voltage of the previous stage) is greater than  $DVin,max$ . It should be noted that typically, circuit designers will apply a differential input voltage of  $2Von$  to get one leg of a differential pair to carry all of the current of the tail source device. In magnitude,  $2Von > DVin,max$ .  $2Von$  is an easy parameter for a designer to keep in mind when utilizing CML buffers in high-speed integrated circuits and applying a differential input of  $2Von$  to any differential pair will guarantee that only one leg carries all of the current from the tail source device. Since CMOS inverters are vulnerable to power-ground noise and experience large current surges during voltage switching (that aggravates power-ground noise), tapered CMOS inverted chains will experience noisy power-ground wires that cause degradation in noise-margin performance and large propagation delays to any prior stages connected to the same power and ground rails. [3] Therefore, CMOS inverters are not suited to drive high-speed signals off-chip.

---

## Chapter (6)

### Pre-driver

The pre-driver's job is to switch the gate voltages on the two current-steering transistors in each driver segment in such a way that current is steered smoothly from one output pad to the other. This requires a make-before-break action in the current-steering transistors to avoid allowing the current tail transistor to fall out of saturation and the tail voltage to collapse. Ideally, the voltage on the tail node should remain undisturbed during a switching event; otherwise, the changing tail voltage will inject charge into the signal paths.

To obtain this behavior, the pre-driver's outputs must switch between VDD and a voltage just slightly less than  $V_{tail} + V_{TN}$  ( $V_{TN}$  is the body-effected NFET threshold). The outputs should switch as a complementary pair, and thus a differential pre-driver is preferred. This choice for  $V_{Low}$  out of the pre-driver optimizes several aspects of driver behavior. The voltage crossover between rising and falling outputs of the pre-driver holds both steering devices on at the midpoint of the switching event, and thus the tail voltage will be held approximately constant. The rising pre-driver output almost immediately begins turning on its steering transistor; therefore, there is no delay waiting for the signal to exceed  $V_{TN}$  before switching begins. The signal swing out of the pre-driver is minimized, and consequently the pre-driver can be made small and fast.

Figure 1.6.1 shows an implementation of a pre-driver using the methods described in and the pass-gate multiplexing methods of Figure y.3. Note that the current tails are all drawn at 0.4- $\mu\text{m}$  length to help track the behavior of the output driver. All other FETs are minimum length. The voltage swing on  $dH$ ,  $dL$  depends on the ratio of the PFET loads in the pre-driver to the NFET current tails. Rationing a PFET against an NFET is generally not a desirable circuit construction, but adequate current steering can be achieved with only moderately accurate tracking of the driver's tail voltage, and thus this implementation works well in practice.[6.1]



Figure 1.6.1

## Chapter (7)

### Latch and Flip-flop

#### 1.7.1 Ultra High-Speed Latch Design

Current-mode logic (CML) latch consists of an input tracking stage, MN1 and MN2, utilized to sense and track the data variation and a cross-coupled regenerative pair, MN3 and MN4, being employed to store the data. Fig. 1.7.1 demonstrates a CMOS CML latch circuitry.

The track and latch modes are determined by the clock signal inputs to a second differential pair, MN5 and MN6. When the signal  $V_{CLK}$  is "HIGH", the tail current ISS entirely flows to the tracking circuit, MN5 and MN6, thereby allowing  $V_{out}$  to track  $V_{in}$ . In the latch-mode, the signal  $V_{CLK}$  goes low, the tracking stage is disabled, whereas the latch pair is enabled storing the logic state at the output.

Like CML buffers, a CML latch operates with relatively small voltage swings which are  $4V_{THN}$  peak-to-peak differential-mode. Fig. 1.7.1 allows us to implement high-speed latch circuit. However, there are several shortcomings involved in the design of the regenerative latch in Fig. 1.7.1 that lead to a complete operation failure at very high frequencies ( $\geq 10GHz$ ). The primary limitation is that a single tail current is used for both tracking and latch circuits. Consequently, the bias operations of tracking and latch circuits are tightly related. This will severely limit the allowable transistor sizes for a reliable latch operation. At ultra-high-frequencies ( $\geq 10GHz$ ) the parasitic capacitances of transistors, MN1 and MN2, degrade the required minimum gain for a proper tracking operation. Therefore, the tail current must be sufficiently high to achieve a wider range of linearity and a larger trans-conductance. On the other hand, the latch circuit does not need a large bias current at ultra-high-frequencies.



Figure 1.7.1



Figure 1.7.2

To address the aforementioned problems, the regenerative CML latch is modified so that the latch circuit and the tracking circuit use two distinct tail currents. Fig. 1.7.2 shows the new CML latch circuit. As observed in Fig. 7, the tracking stage and the latch stage are now separately optimized for a correct latch operation at ultra-high-frequencies. Note that it is important the source coupled pair transistors have high gain. This is obviously achieved with larger for each transistor of the cross-coupled pair. However, this technique greatly limits the driving capability. Therefore the CML latch is followed by a CML buffer to recover the logic level.

---

# Chapter (8)

## Simulation and result

### 1.8.1 Specification that must meet

| Parameter                                                        | Subclause reference | Value                                     | Units |
|------------------------------------------------------------------|---------------------|-------------------------------------------|-------|
| Signaling speed                                                  | 72.7.1.3            | $10.3125 \pm 100$ ppm                     | GBd   |
| Differential peak-to-peak output voltage (max.)                  | 72.7.1.4            | 1200                                      | mV    |
| Differential peak-to-peak output voltage (max.) with TX disabled | 72.6.5              | 30                                        | mV    |
| Common-mode voltage limits                                       | 72.7.1.4            | 0–1.9                                     | V     |
| Differential output return loss (min.)                           | 72.7.1.5            | [See Equation (72–4) and Equation (72–5)] | dB    |
| Common-mode output return loss (min.)                            | 72.7.1.6            | [See Equation (72–6) and Equation (72–7)] | dB    |
| Transition time (20%–80%)                                        | 72.7.1.7            | 24–47                                     | ps    |
| Max output jitter (peak-to-peak)                                 |                     |                                           |       |
| Random jitter <sup>a</sup>                                       | 72.7.1.8            | 0.15                                      | UI    |
| Deterministic jitter                                             |                     | 0.15                                      | UI    |
| Duty Cycle Distortion <sup>b</sup>                               |                     | 0.035                                     | UI    |
| Total jitter                                                     |                     | 0.28                                      | UI    |

<sup>a</sup>Jitter is specified at BER  $10^{-12}$ .

<sup>b</sup>Duty Cycle Distortion is considered part of the deterministic jitter distribution.



Figure 72–12—Transmitter output waveform

**Table 72–7—Transmitter output waveform requirements related to coefficient update**

| Coefficient update <sup>a</sup> |           |           | Requirements <sup>b</sup>   |                             |                             |
|---------------------------------|-----------|-----------|-----------------------------|-----------------------------|-----------------------------|
| $c(1)$                          | $c(0)$    | $c(-1)$   | $v_1(k) - v_1(k-1)$<br>(mV) | $v_2(k) - v_2(k-1)$<br>(mV) | $v_3(k) - v_3(k-1)$<br>(mV) |
| increment                       | hold      | hold      | -20 to -5                   | 5 to 20                     | 5 to 20                     |
| decrement                       | hold      | hold      | 5 to 20                     | -20 to -5                   | -20 to -5                   |
| hold                            | increment | hold      | 5 to 20                     | 5 to 20                     | 5 to 20                     |
| hold                            | decrement | hold      | -20 to -5                   | -20 to -5                   | -20 to -5                   |
| hold                            | hold      | increment | 5 to 20                     | 5 to 20                     | -20 to -5                   |
| hold                            | hold      | decrement | -20 to -5                   | -20 to -5                   | 5 to 20                     |

<sup>a</sup>Step size requirements for the tap under test apply regardless of the current value of the other taps.

<sup>b</sup>This difference is measured relative to the voltage prior to the assertion coefficient update  $k$  equal to hold.

## 1.8.2 Simulation results

In our system we use CML driver with rppoly double termination this driver is biased by current source based DAC. The DAC convert digital input to desired analog current. We control the current in each driver at FIR equalizer tap to achieve a desired gain in this tap.

### Required output wave form



## Required Rise and fall time



## Eye diagram without any Equalization



### **Eye diagram after signal passes through 14inch channel**



### **Eye diagram When pre cursors and post cursors are opened to compensate losses in 14inch channel**



## Eye diagram when signal passed through channel with appropriate Equalization



## References:

- [1] Yu Chien-Cheng, "Low-Power Double Edge-Triggered Flip-Flop Circuit Design", The 3rd International Conference on Innovative Computing Information and Control (ICI-CIC'08), IEEE 2008
- [2] R.Hossain, L.D. Wronski, and A. Albicki, "Low power design using double edge triggered flip-flops," IEEE Trans. on VLSI Systems, vol 2, no. 2, pp. 261-265, June 1994.
- [3] M.Pedram, Q. Wu, and X. Wu, "A new design of double edge-triggered flip-flops," in Proc. Asia and south Pacific Design automation conference (ASP DAC), 1998, pp. 417-421.
- [4] Y. Moissiadis, I. Bouras, A. Arapoyanni and L. Dermentzoglou, " A static differential double edge-triggered flip-flop based on clock racing", Microelectronics Journal, Vol 32, pp.665-671,2001.
- [5] Fabian Klass, "Semi-Dynamic Flip-Flops with Embedded Logic" IEEE, 1998.
- [6] Yin-Tsung Hwang, Jin-Fa Lin, and Ming-HwaSheu, "Low-Power Pulsed-Triggered Flip-Flop Design with Conditional Pulsed-Enhancement Scheme", IEEE Transaction on Very Large Scale Intergradation (VLSI) systems, IEEE 2011.
- [7] M.Matsui, H.Hara, Y.Uetani, L.Kim, T.Nagamatsu, Y.Watanabe, A.Chiba, K.Matsuda, and T.Sakurai, " 2D DCT macrocell using sense-amplifying pipeline flip-flop scheme," IEEE Journal Solid-State Circuits, vol29, no.5, pp1482-1490,Dec. 1994.
- [8] B.Nikolic, V.G.Oklobdzija, V.Stajanovic, W.Jia, J.K.Chiu, and M.M.Leung,"Improved sense-amplifier based flip-flop," IEEE Journal Solid-State Circuits, vol35, no.6, pp876-883,Jun.2000.
- [9] Antonio G.M.Strollo, Davide De Caro, Ettore Napoli, and Nicola Petra,"ANovel High-Speed Sense-Amplifier-Based Flip-Flop," IEEE Transaction on Very Large Scale Intergradation (VLSI) systems vol.13, no,11. Nov 2005.
- [10] D.Markovic and J.Tschanz, "Transmission-gate based flip-flop," U.S. patent 6,642,765, November 2003.
- [11] Jinn-Shyan Wang, Po-HuiYang,"A Pulse-Triggered TSPC Flip-Flop for High-Speed Low-power VLSI Design Applications," IEEE, 1998.
- [12] J.Suzuki, K. Odagawa, and T.Abe, "Clocked CMOS calculator Circuitry," IEEE Journal of Solid State Circuits, vo. SC-8, Dec 1973, pp 462-469.
- [13] N. Nedovic, V. G. Oklobdzija, W. W. Walker, "A Clock Skew Absorbing Flip-Flop", 2003 IEEE ISSCC, San Francisco, Feb. 2003. [14] Jinn-Shyan Wang, " A new true-single-phase-clocked double-edge-triggered flip-flop for low-power VLSI design,", in proceedings of IEEE ISCAS 1997, pp1896-1899.
- [15] W. Zhao and Y. Cao. "New generation of predictive technology model for sub-45nm design exploration," In *IEEE Intl. Symp. On Quality Electronics Design*, 2006
- [16] Hamid Partovi, Robert burd, UdinSalim, Frederick weber, Luigi DiGregorio, Danold Draper, " Flow-throug Latch and Edge-Triggered Flip-flop Hybrid Elements," IEEE International Solid-State Circuits Conference, 1996.
- [17] Digital Systems Engineering by William J. Dally, John W. Poult
- [18] Design of High-Performance Microprocessor Circuits by Anantha Chandrakasan William J. Bowhill ,Frank Fox .

# **Sector (2)**

## **Receiver**

## **Contents:**

### **Analog Front End (AFE)**

#### **2.1 Continuous Time Linear Equalizer (CTLE)**

- 2.1.1 Introduction
- 2.1.2 Concept
- 2.1.3 Design of CTLE
- 2.1.4 Simulation Results
- 2.1.5 Conclusion

#### **2.2 Variable Gain Amplifier (VGA)**

- 2.2.1 Design
- 2.2.2 Simulation Results

#### **2.3 Analog to Digital Conversion**

- 2.3.1 Introduction
- 2.3.2 Types of ADCs
- 2.3.3 Choice of Converter
- 2.3.4 Concept of Time-Interleaving

#### **2.4 High Speed Flash ADC**

- 2.4.1 Components of Flash ADC
- 2.4.2 Track and Hold (THA)
- 2.4.3 Reference Ladder
- 2.4.4 Comparator Design
- 2.4.5 Encoder design
- 2.4.6 Flash ADC performance

#### **2.5 Behavioral Modeling using Matlab Models**

- 2.5.1 Modified MATLAB Model from AMS Library
- 2.5.2 Detailed MATLAB Model for single ADC

2.5.3 Detailed MATLAB Model for interleaved ADC

2.5.4 Static performance for single ADC

# Analog Front End (AFE)

## Chapter (1)

### Continuous Time Linear Equalizer (CTLE)

#### 2.1.1 Introduction

The two main sources of signal distortion in digital communication channels are ISI and additive noise. The ISI is due to band limited channels or multipath propagation and can be characterized by the channel transfer function in the frequency domain or by the impulse response, in the time domain.

An example of ISI problem is shown in the figure below. Thus, more design works are required in order to address these non-ideal effects introduced by the channel for a good electrical performance of the interconnect system.



Fig.2.1.1

The noise can be internal to the system or external to the system; if the noise is introduced primarily by electronic components and amplifiers at the receiver, it may be characterized as thermal noise.

The interconnect between transmitter and receiver in a transceiver system behaves as a non-ideal transmission line at high-frequency. The received signals in high-speed link suffers from frequency-dependent attenuation and dispersion. In fact, these undesired effects are attributed to skin effect and dielectric loss at high frequency. Skin effect is a phenomenon where high frequency current tends to propagate only at a shallow band beneath the surface of the transmission line. The effective cross-sectional area for current propagation is limited. As a result, the resistance of the transmission line increases and losses are incurred. Dielectric loss on the other hand is caused by degradation of the dielectric material as an insulator during signal propagation through the transmission line. Skin effect dominates at lower frequency while dielectric loss dominates at frequency more than around 2 GHz. Dielectric loss dominates at higher frequency since it is proportional to frequency.

As a result, interconnect bandwidth has become the bottleneck in current high-speed systems. In order to support higher data rates, special circuit and signal processing techniques have

successfully been used to mitigate the frequency dependent attenuation in serial data transmission applications.

The channel is a band-limited transmission media of low-pass type, hence at the receiver the distortion must be compensated in order to reconstruct the transmitted symbols, a high-pass filter can be used. This process of suppressing channel induced distortion is called channel equalization.

An analog equalization approach uses usually a linear equalizer followed by a DFE (Decision Feed Back Equalizer) or a FFE (Feed Forward Equalizer) or both.

Equalization is a circuit technique used to compensate the high-frequency roll-off effect of the non-ideal channel. Eventually, the interconnect bandwidth can be extended to support higher data rate. These circuits synthesize the requested transfer function in analogic way.

## 2.1.2 Concept

One approach to implement equalization is to use a continuous-time linear equalizer (CTLE) that provides gain peaking in order to boost up high-frequencies to counter the channel attenuation and distortion. The main advantages of this approach are lower power, less silicon area, and most importantly, the capability of performing equalization that can readily be implemented at the transmitter or receiver.

Majority of the CTLE circuit implementation is first order design. However, higher order CTLE circuit is useful when dealing with severely impaired channel as it provides greater boosting at the expense of area efficiency, power consumption, and circuit complexity.

Fig. 2.1.2 illustrates the concept of an equalizer from a frequency domain point of view. By placing an equalizer in series with the channel, the transfer function of the equalized channel, the product of the transfer functions of the channel and CTLE, becomes flat over a wider frequency range as shown in fig. 2.1.2. A typical channel, without frequency notches, has a low pass characteristic that can be approximated by one or few poles as:

$$H_{ch}(s) = \frac{k_{ch}}{s + p_{ch}}, \quad \text{equ. 2.1.1}$$

Where  $P_{ch}$  is the dominant pole of the channel. By placing the CTLE filter in series with the channel, the equalizer selectively adds gain to the high frequencies. Thus, the CTLE has a transfer function that is described as:

$$H_{CTLE}(s) = \frac{\tilde{k}_1(s + z_1)}{(s + p_1)(s + p_2)}, \quad \text{equ. 2.1.2}$$

Where  $Z_1$  and  $P_i$ 's are the zero and poles of the CTLE, respectively. If the zero of the CTLE cancels the dominant pole of the channel, the transfer function of the equalized channel can be described by:

$$H_{EQ}(s) = \frac{k_1}{(s + p_1)(s + p_2)}, \quad \text{equ. 2.1.3}$$



Frequency responses of (a) the channel, (b) the CTLE, and (c) the equalized channel. (● : -3 dB point.)

Fig. 2.1.2

Since the poles of the CTLE are larger than the dominant pole of the channel, the bandwidth of the equalized channel is increased. For example, the -3dB bandwidth of the channel, shown in previous figure, increased from 1.5 GHz to 4.5 GHz. Such increase in bandwidth can be achieved as long as the channel does not exhibit deep notches. For channel with deep notches, high data rate can be transmitted using advanced signaling or modulation techniques that restrict information transmission around the notches.

Figure 2.1.3 shows the ideal responses of the channel and the equalizer and both together plotted on one graph:



Concept of equalization in frequency domain.

Fig. 2.1.3

In order to obtain a high pass filter transfer function, a number of zeros must be introduced at low frequency adequate to match the channel shape, as shown in figure 2.1.4.



**Channel and linear equalizer transfer function.**

Fig. 2.1.4

From all of the previous figures we conclude that the peaking gain and the peaking frequency of a CTLE are considered key design parameters to improve link performance. Many high-speed signaling standards define peaking gain as specification of the transmitter or receiver equalizer to achieve their target data rates. Peaking gain and frequency aside the zeros play a major role in shaping the frequency response of the CTLE. These zeros closely relate to the poles of the channel to be equalized as illustrated by equations before; thus their locations need to be selected with care when optimizing CTLE parameters.

#### Pros

- Active CTLE provides gain and equalization with low power and area overhead.
- Cancel both precursor and long tail ISI

#### Cons

- Equalization is limited to 1st order compensation.
- Noise and cross are amplified
- Very sensitive to PVT and be hard to tune
- The speed is limited by gain bandwidth of the amplifier.

### 2.1.3 Design of CTLE

Linear equalizers are commonly designed in frequency domain. Figure 2.1.5 (a) and (b) shows the Bode plots of the CLTE with one and two zeros, respectively. The zeros produce a  $+20\text{ dB/Decade}$  rise in the frequency response. For some legacy and lossy channels, one zero alone is not adequate to fully reverse the effect of the channels. A second zero could be introduced to add an additional  $+20\text{ dB/Decade}$  in a frequency range. The poles locate the peaking frequency and determine the roll-off of the high frequency response. However, optimally locating the zeros of higher-order CTLE using iterative search is challenging and prone to local minima. Equalizer filters can be designed using standard filter design techniques using passive or active components. The equalizers using active components can provide gain greater than unity and can easily be integrated in silicon. One of the common types of CTLE is source-coupled differential pair with source degeneration as shown in (c) in figure 2.1.5. The high-frequency boosting is achieved by introducing real zero using the parallel resistor and capacitor network as shown in (c) in figure 2.1.5.



Bode plots of CTLE with (a) one and (b) two zeros, and (c) a CTLE circuit using capacitive degeneration.

Fig. 2.1.5

The load resistance  $R_L$  and the output parasitic capacitance  $C_P$  introduce an additional high frequency pole. To obtain the desired boost, the  $R_s$  value must be chosen as a trade off between the DC gain and the high frequency boost. This corresponds to a compromise between sensitivity, dynamic range, noise and offset tolerance versus the capability to match the channel. At the high frequency, degeneration capacitor shorts the degeneration resistor and creates peaking. The peaking and DC gain can be tuned through adjustment of degeneration resistor and capacitor.

The transfer function of the CTLE shown in (c) in figure 2.1.5 is given by:

$$H(s) = \frac{g_m}{C_L} \frac{s + \frac{1}{R_s C_S}}{(s + \frac{1+g_m R_s/2}{R_s C_S})(s + \frac{1}{R_L C_L})}$$

equ. 2.1.4

The real zero and poles are given by:

$$\omega_z = \frac{1}{R_s C_s}, \omega_{P_1} = \frac{1}{R_L C_L}, \text{ and } \omega_{P_2} = \frac{1 + g_m R_s / 2}{R_s C_s}. \quad \text{equ. 2.1.5}$$

The dominant pole is designed to be higher than the zero frequency to achieve high-frequency peaking gain. The amount of the peaking gain is controlled by the ratio of the dominant pole and zero frequencies. The gain at the peaking frequency is proportional to the ratio of pole to zero frequencies and can be approximated by:

$$A_{max} = A_o \cdot \frac{\omega_{P_1}}{\omega_z}. \quad \text{equ. 2.1.6}$$

Where  $A_o$  is the low-frequency gain (DC gain) of the CTLE given by:

$$A_o = \frac{g_m R_L}{1 + g_m R_s / 2}. \quad \text{equ. 2.1.7}$$

The frequency response of a first-order CTLE is shown in figure 2.1.6:



Frequency response of first-order CTLE.

Fig. 2.1.6

The bandwidth of the first-order CTLE is limited by the parasitic pole  $\omega_{P_2}$ . In fact,  $\omega_{P_2}$  will reduce the maximum peaking gain achievable in the first-order design if its location is near to  $\omega_{P_1}$ .

In order to increase the boost a chain of degenerated differential pairs can be used.

## 2.1.4 Simulation Results

CTLE used is tested among different channels to check that it is performing as expected, in the following figures we show some typical back plane channels:

For a channel with  $S_{21} = -5.3$  dB, the AC response is shown in figure 2.1.7, eye diagram is shown in figure 2.1.8



Fig. 2.1.7



Fig.2.1.8

For a channel with  $S21 = -9.5$  dB, the AC response is shown in figure 2.1.9, eye diagram is shown in figure 2.1.10



Fig. 2.1.9



Fig. 2.1.10

Our CTLE can be adapted for 8 different boosting levels as follows :

| Rdeg ( $\Omega$ ) | Boosting (dB) |
|-------------------|---------------|
| 40                | 0             |
| 65                | 1.5           |
| 110               | 3             |
| 175               | 4.5           |
| 250               | 6             |
| 450               | 7.5           |
| 900               | 9             |

Table of characteristics of the CTLE

| Parameter         | Value     |
|-------------------|-----------|
| Power consumption | 7.843 mW  |
| Technology        | 65nm      |
| Supply voltage    | 1V        |
| Range             | 0 – 10 dB |
| Data rate         | 10 Gbps   |

## 2.1.5 Conclusion

Equalizer circuits are popular solutions targeting at extending the band-limited channel at high-speed transceiver system. CTLE is an example of receiver equalizer that provides high-frequency boosting to compensate the low pass effects of the non-ideal channel. Analysis on first-order CTLE is presented. Simulation results show that the CTLE can boost the bandwidth of the signal after the channel attenuation. In eye diagram simulations, CTLE improves the received eye voltage opening, eye time opening (UI), jitter, and amplitude noise.

## Chapter (2)

### Variable Gain Amplifier (VGA)

#### 2.2.1 Design

The channel severely attenuates the data. So the received 10 Gbps data must be amplified to higher values suitable for operating the following stage (ADC), it's supposed for this amplifier to have a very high linearity and add minimum noise to the received data. Fig. 2.2.1 shows the design used.

Varying the amplifier gain is done by varying the degeneration resistance using thermometer coded switches turning on and off according to digital controls, introducing new resistance values in parallel to adjust the output swing at the needed values, the amplifier consists of 8 stages.



Fig. 2.2.1

The equation of the amplifier gain is given by :

$$A = \frac{gm \cdot Rd}{1 + gm \left( \frac{Rs}{2} \right)} \quad \text{equ. 2.2.1}$$

## 2.2.2 Simulation Results

The amplifier is supposed to hold the output swing to be 1.2V, so it is adjusted so that it will be adapted with different channel losses, the table below shows the different adaptations of the value of the Rdeg to give the required constant output swing:

| Rdeg ( $\Omega$ ) | Channel loss (dB) | -3dB bandwidth (GHz) |
|-------------------|-------------------|----------------------|
| 450               | 0                 | 5.9                  |
| 400               | 1.5               | 6.2                  |
| 325               | 3                 | 6.6                  |
| 250               | 4.5               | 7.14                 |
| 200               | 6                 | 7.6                  |
| 175               | 7.5               | 7.8                  |
| 150               | 9                 | 8.81                 |
| 80                | 10.5              | 9.4                  |

The following figure shows the eye diagram with 1.2 Vpp after passing by the CTLE and VGA



Fig.2.2.1

The following plot is for one of the adjustments of the VGA which shows that the band width is suitable for the 10 Gbps and there is no band width limitation, one of the topology advantages.



Fig. 2.2.2

The following figure shows the Ac response of the different variations of the VGA:



Fig. 2.2.3

Power consumed = 3.2 mW

## References

- [1] Ahmed Adel, Ahmed Arafa, Dina Reda, "Equalizer Implementation for 10 Gbps Serial Data Link in 90 nm CMOS Technology", IEEE ICM - December 2007.



Fig.2.2.1

The following plot is for one of the adjustments of the VGA which shows that the band width is suitable for the 10 Gbps and there is no band width limitation, one of the topology advantages.



Fig. 2.2.2

The following figure shows the Ac response of the different variations of the VGA:



Fig. 2.2.3

Power consumed = 3.2 mW

## **Chapter (3)**

### **Analog to Digital Conversion**

#### **2.3.1 Introduction**

First of all, this part is focusing at high-speed, low power AD converters. ADCs are part of a mixed analog/digital system, such that, whenever necessary, the digital part can be used for processing of information. Nonetheless, it is recognized that the digital part does not come for free and has to be taken into account when evaluating the overall performance.

In this work, the main goal is to improve the performance of the digital receiver through a high-speed, low power ADC. The performance is evaluated by means of a widely accepted figure of merit (FOM) that includes speed, accuracy and power consumption. The reason to focus on the FOM is because it is an important property of each ADC, independent on the application or situation where the ADC is used. As a second goal, this work (i.e. putting ADC inside the Rx) aims at using circuit solutions that are portable to future technologies to provide a future-proof solution, as a future trend such work can be used for multi-standards systems.

This chapter starts with a study on current state-of-the-art in ADCs design. Then, we discuss what ADC fits our application (i.e. High speed serial link digital receiver).

#### **2.3.2 Types of ADCs**

ADCs can be implemented in a number of different architectures depending on their applications. These architectures can be divided into three categories:

- Low-speed High-resolution
- Medium-speed Medium-resolution
- High-speed Low-resolution

Such as SAR, Flash, Pipeline, cyclic, folding, two-step,... etc. for our design we will compare between mainly three types of them which are pipeline, flash, and folding.

##### **2.3.2.1 Pipeline ADC**

Pipeline ADCs are most suitable architecture for the sampling rates ranging from few mega samples per second (Msps) to several hundreds of Msps. They can achieve the resolution of eight to 16 bits depending on the conversion speed. The major advantage of pipeline ADCs is that their power dissipation grows *linearly* with the number of bits. They have a wide range of applications including fast Ethernet, digital receivers, and cable modems [1]. In recent years, pipeline ADCs have improved in speed, resolution,

dynamic performance and low power. They are mostly used in systems where high dynamic performance is needed. Pipeline ADCs can be used in high speed systems by using interleaving method. The disadvantage of interleaving is a reduction in SNR because of mismatches in interleaved paths. Also, the area and the power consumption are increased by using interleaving.

### **2.3.2.2 Flash ADC**

Flash ADCs also called *parallel* ADCs are suitable for applications where *very high conversion speed* is required. They are limited to the resolution of *six* to *eight* bits because the number of comparators used in these ADCs *double* if the resolution is increased by *one* bit. Because of a large number of comparators, they consume more power and are very costly for higher resolutions. The major applications of flash ADCs are in satellite communication, radar processing, data communications and real time oscilloscopes [2].

### **2.3.2.3 Folding ADC**

For applications requiring both higher speed with high resolution, folding ADCs can be used. They can achieve the conversion speed of GHz with a resolution of eight to 12 bits. In contrast to Flash ADCs where number of comparators is increased by  $(2n)$  where  $(n)$  is the number of bits, folding reduces the number of comparators by  $(2m)$  where  $(m)$  is the folding factor. So these ADCs achieve the speeds closer to that of Flash ADCs with less power consumption. They are used in video applications. The major disadvantage of folding architecture is that it is highly susceptible to device mismatch [3].

## **2.3.3 Choice of Converter**

Table 2.3.1 shows the comparison of three different ADC architectures.

| Type     | Resolution | Speed  |
|----------|------------|--------|
| Pipeline | High       | Medium |
| Flash    | Low        | High   |
| Folding  | Medium     | High   |

Table 2.3.1: Comparison table of three fast ADC architectures.

Figure 2.3.1 shows the Resolution versus signal bandwidth



Ref: S. Chen, R. Brodersen, "A 6-bit 600-MS/s 5.3-mW Asynchronous ADC in 0.13- $\mu$ m CMOS: IEEE J. of Solid-State Circuits, Vol. 41, No. 12, December 2006.

Fig.2.3.1

Among several high speed architectures, Flash ADC can achieve better sampling rates as the only analog building block in flash ADC is *comparator* [4].

The function of flash ADC is very simple; it compares the analog input signal with a number of reference voltages and produces the output. The parallel architecture of flash ADC allows conversion process in one clock cycle. So at the same conversion rate, flash ADC has low latency where latency is the number of clock cycles required by an ADC to convert the given input to an output. Thus, the flash architecture is *extremely* fast and allows data conversion rates in *several GHz*. The only problem with the flash ADC is the power consumption which increases sufficiently with the increase in number of bits, thus limiting the number of bits from *six* to *eight*.

The number of bits for an ADC can be calculated by using the formula [4]:

$$ENOB = \frac{SINAD - 1.76}{6.02}$$

By having the information of signal amplitude and noise, SNR can be found and using this SNR, the ENOB of ADC can be calculated. Keeping the non-ideal effects in mind, the ADC should be designed for one or two more number of bits than desired.

According to the standard we are working with (i.e. 10GbaseKR) this is high speed (10 GS/s), and number of bits is determined according to the channel attenuation, if channel attenuation > 30 dB 4 bits will be needed, and a 3-bit ADC is sufficient for the channels with less than 20 dB attenuation [5], Hence the ADC we need is high speed, low resolution ADC. So flash ADC is the best possible choice and used for this purpose in this project.

## 2.3.4 Concept of Time-Interleaving

Instead of designing a single ADC with the full rate needed which will affect the power consumption seriously –as shown in figure 2.3.2 - and in order to push design ahead, parallel or time-interleaved (TI) architectures are one of the most effective solutions to boost the maximum speed of analog interfaces at the system level.

There are several reasons for this. First, due to the fixed parasitic capacitance (associated with interconnect and certain device parasitic), active device sizes need to be scaled up to increase the speed. When active device sizes become so large that their scalable parasitic capacitance dominates the fixed parasitic capacitance, any additional increase in speed requires a very large increase in device size and, thus, power consumption. The second reason is more applicable to CMOS-like regenerative amplifiers. The  $cv^2f$  power of a single regenerative amplifier stage scales linearly with frequency. However, as time available for regeneration shortens, the gain per regenerative stage drops exponentially, requiring more cascaded stages for a given total gain target. This, again, results in power consumption scaling with frequency much faster than linearly. In principle the operating *speed* of the TI-ADC can grow linearly by increasing the number of parallel ADC channels, however various types of mismatches among different channels create modulation tones which degrade the performance of the TI-ADC, like offset, gain, timing as well as bandwidth mismatches (from the sampling RC time constant). [11-20], those mismatch effects must be fully characterized to achieve satisfactory performance in the design of TI-ADCs.



Fig.2.3.2: Power savings from interleaving

Figure 2.3.3 shows an example of TI-ADC architecture, which utilizes M identical sub-ADCs arranged in parallel. The sub-ADC in each time-interleaved path operates at a sampling frequency of  $(fs/M)$ , where  $(fs)$  denotes the overall equivalent sampling frequency of the whole system, and  $(M)$  stands for number of branches. Figure 1.4 shows the clocking waveforms for such parallel architecture. The sampling clocks applied to each path are delayed respectively by one overall sampling period “ $T = 1/fs$ ”, thus each

subsystem samples the analog input signals in an alternative manner. By employing parallel architectures the effective sampling rate can be multiplied by the number (M) of TI paths.



Fig.2.3.3 Interleaved ADC



Fig.2.3.4 Timing plan of interleaved ADC

# **Chapter (4)**

## **High Speed Flash ADC**

As we have seen in the previous chapter that the best-known architecture for a high-speed analog-to-digital converter is the *flash* converter structure. The aim of our project is to design a high-speed ADC with less power consumption.

### **2.4.1 Components of Flash ADC**

In flash ADC an array of comparators compares the input voltage with a set of increasing reference voltages. The comparator output represents the input signal in a thermometer code, which will then convert into binary code. By this description we can easily understand that almost all flash ADC comprises of following blocks:

1. Track and Hold Circuit
2. Resistor Ladder Block
3. Comparator Block
4. Encoder Block

Both Track and Hold circuit –THA- and Decoder Block can be optional as each Comparator samples the input signal before it compares the signal with a reference but this can be not the exact way at high frequency input signal, also for the decoder, if the ADC is preceded with DSP unit this decoder can be a function of this DSP functions.

The block diagram of conventional flash ADC is shown in figure 2.4.1. The signal coming from the track and hold is compared with a number of reference voltages which are generated by a reference circuit. If the input voltage is higher than the reference voltage, the comparator gives '1' at its output, and if the input voltage is lower as compared to the reference voltage, the comparator gives '0'. The code produced by comparator array is called *thermometric* code which is then converted to binary output by an encoder or directly forwarded to the digital processing unit after that to make decisions [4], the flash architecture often results in a high input capacitance, in comparison with other architectures, due to the high parallelism.



Fig.2 4.1 Main architecture of the Flash ADC

## 2.4.2 Track and Hold (THA)

### 2.4.2.1 Overview

In the conventional designs of FLASH ADCs, there was no requirement to precede the comparator array with a track-and-hold circuit because the correct quantization level is decided within one clock cycle. The absence of a track-and-hold increases the circuit requirements and introduces several error sources. For example, timing skew between the comparators would result in signal dependent distortion as the comparators would sample different time instances of the input signal, and this is the serious problem when we have high frequency input signal so for such high frequencies design is better to have a THA preceding the comparator array to ensure steady value for comparators to relax the design of comparators.

A basic passive THA circuit as shown in figure 2.4.2 consists of a sampling switch and a hold capacitor. During the first half of clock cycle, it tracks the signal which is called the *tracking* mode and during the second half of clock cycle it holds this value in the hold capacitor for subsequent processing, and this is called the *hold* mode. The switch and the hold capacitor make a RC network, the time constant of which determines the BW that can be achieved through this network [6, 7].



Fig. 2.4.2 Basic Track and Hold Circuit

T&H circuit is necessary for high speed flash ADC to avoid clock dispersion, it is used to improve the dynamic behavior and reduce the errors due to aperture jitter and clock skew [8]. This is because the THA circuit is very small as compared to the whole ADC. The clock skew and jitter problems have effect only on the front end THA circuit and the ADC has stable signals at their input. The overall performance of ADC depends highly on the performance of THA circuit [9, 10].

There are two scenarios for the Track and Hold circuits the open loop and the closed loop solutions. The high speed nature and moderate SNR requirement makes operational amplifier connected in some sort of feedback configuration a *less* favorable solution even though they prevail in higher resolution architectures like pipelined ADC. To reach the desired speed, the OTA is expected to consume an unacceptable amount of power.

When comparing the open-loop and closed-loop solutions with respect to speed, accuracy and FOM, the following can be observed:

- Speed: the open-loop solutions achieve a higher speed (90MHz - 10GHz) compared to the closed-loop solutions (240MHz).
- Accuracy: on average, the linearity of the closed-loop solutions (50dB - 78dB) is better than the linearity of the open-loop solutions (28dB - 63dB).
- FOM: both closed-loop and open-loop solutions can achieve a FOM below 100fJ.

So, to reach the goal of our design we choose to take the open-loop solution.

#### **2.4.2.2 Design goal**

Source follower, on the other hand, has some desirable attributes like low output resistance and high driving capability and is widely used in flash ADC designs [21], [22] and [23]. Source follower is therefore chosen for our design. To accommodate differential input, pseudo differential structure is implemented.

The achieved performance in terms of linearity and noise (expressed in SNDR or ENOB); in most cases, only the linearity (expressed in SFDR or THD) is given while noise is neglected.

#### **2.4.2.3 T&H Circuits**

A general view of a differential open-loop T&H circuit is given in fig.2.4.3 the analog time-continuous input signal is sampled onto the sampling capacitors by means of switches. The switches are controlled by an externally applied clock signal through a switch driver. An open-loop output buffer is used to drive the load (the ADC) without affecting the sampled value at the capacitors.



Fig.2.4.3 Track and hold Architecture

It should be noted that this architecture is actually composed of two open-loop structures: the first one is the sampling structure itself (switches, switch drivers and capacitors), and the second one is the output buffer which is to be implemented as an open-loop circuit as well. First, the sampling structure will be discussed. Then, two alternative implementations for the open-loop buffer will be analyzed.

#### 2.4.2.4 Sampling core architecture

The actual core of the T&H circuit is the sampling circuit, composed of the sampling capacitors, switches and switch drivers. The size of the sampling capacitors was set to 200fF, such that for a full-scale input sine (1Vpp) an SNR of around 64dB is achieved. The switches use the bootstrapping technique to achieve both high speed and high linearity as presented by [24]

Using this technique, the actual switch can be implemented with a single NMOS device. High speed is obtained by driving the switch with a high overdrive voltage  $V_{gs} = V_{DD} = 1V$  because of which a small transistor can be used as switch, which in turn reduces the parasitic capacitance. As a result of the high overdrive voltage, a small on-resistance is still achieved. Next to that, as the bootstrapping technique generates a constant  $V_{gs}$  voltage, independent on the input signal  $V_s$  at the switches' source, high linearity is achieved as well. The implementation of the switch driver is shown in fig. 2.4.4, the capacitors are pre-charged to act as an internal 1V battery. When CLK is low, the gate of the sampling switch is connected to ground to open the switch. When CLK is high, the 1V battery will be connected between the source and the gate of the sampling switch, such that  $V_{gs} = V_{DD} = 1V$  and the switch will be turned on.



Fig.2.4.4 bootstrapping technique applied to the sampling switch.

### 2.4.2.5 Output buffer

As we said before, two possible solutions for the required open-loop buffer will be reviewed. The first solution is based on a source-follower, the second on a differential pair. A comparison is made with respect to speed, power consumption, accuracy, mismatch sensitivity, controllability and power supply requirements.

#### 2.4.2.5.1 Source follower

A first option for an open-loop unity-gain buffer is a source follower. A pseudo-differential source follower using NMOS transistors is illustrated in fig.2.4.5 Transistors M1 and M2 are biased with a constant VGS (equal to VB), such that they generate a constant current in each SF. The differential input voltage is applied at the gates of transistors M3 and M4. As a first order approximation, a constant current flows through each of these transistors, resulting in a constant Vgs. As such, the source potentials will track the gate potentials, generating a differential output voltage equal to the differential input voltage. At the same time, because of the Vgs voltage drop across transistors M3 and M4, the common-mode level at the output will be lower than the common-mode level at the input. In situations where a common-mode level-shift is undesirable, a second level-shift is necessary to compensate for it. In this section, it is assumed that there is no specific constraint on the common-mode level, such that a second level-shift can be omitted.



Fig.2.4.5 pseudo-differential source follower

#### 2.4.2.5.2 Differential pair

An alternative for the output buffer is a differential pair with resistive load as shown in fig. 2.4.6. Transistor M1 is used as current source, setting the overall current. The differential pair converts the differential input voltage to a differential current. Then, this current is converted to an output voltage by means of the resistors.



Fig.2.4.6 differential pair

In contrast to the situation with the source follower, no inherent level-shift is present in this case. As long as proper biasing of all devices can be maintained, the output common-mode can be set independent of the input common-mode.

#### 2.4.2.5.3 Comparison between the two topologies

- **Power consumption and speed**

Both architectures introduce two time-constants which will limit the speed of the T&H. The first time-constant is related to the output resistance of the preceding stage combined with the input capacitance of the buffer. Note that the input capacitance of the buffer is placed in parallel to the sampling capacitors. As for higher accuracies, the sampling capacitance is normally dominant over the buffer capacitance due to noise requirements, this time-constant shows a minor dependency on the buffer design. The second time-constant is related to the output resistance ( $r_{out}$ ) of the output buffer combined with the input capacitance of the ADC (load capacitance). As load capacitance is assumed to be constant for both alternatives, the time-constant is dependent only on output resistance. The gain of source follower is nearly unity, also the differential pair is designed such that:

$$gm * r_{out} = 1$$

As a result, with respect to the tradeoff between power consumption and speed, the circuits perform *identical*.

- **Accuracy, mismatch sensitivity and controllability**

A source follower is perfectly linear and achieves unity-gain independent of the exact current  $I$  or the transistor dimensions. This means that, the linearity of the buffer is not adversely affected by mismatch of the components, process spread or a deviation of the biasing conditions. On the other hand, this also implies that the designer has *little control* over the realized gain and linearity, as these properties are relatively insensitive to the main design parameters, namely bias current and transistor dimensions. In practice, the accuracy of the source follower is limited by the Secondary effects (e.g. channel-length modulation and body-effect [25]) introduce both signal distortion and gain drop. The severeness of these secondary effects, and therefore the accuracy of a source follower circuit, is dependent on the design and the used technology.

On the other side, a differential pair is always inherently a non-linear circuit, even when using the simplified transistor equations. Also, the gain is not approximating unity by default, which can be chosen by the designer. As gain and linearity are dependent on the first order model, they can be well controlled by the designer using the main design parameters like bias current, transistor dimensions and resistor values. At the same time, the relatively high sensitivity of the performance to the first order effects implies that the actual performance will be sensitive to mismatch of components as well.

| Topology  | Source Follower                    | Differential Pair                 |
|-----------|------------------------------------|-----------------------------------|
| Gain      | Determined by second order effects | Determined by first order effects |
| Linearity | “Body effect – channel             |                                   |

|                      |              |      |
|----------------------|--------------|------|
|                      | length mod." |      |
| Mismatch sensitivity | Low          | High |
| Controllability      |              |      |
| Design freedom       |              |      |

Table 2.4.1 comparison between SF and DP

- **Power supply requirement**

From analyzing the stack of the source follower we can find out that:

$$\min(VDD) = 2 V_{ov} + V_{th} + V_{fs}$$

And for the differential pair:

$$\min(VDD) = 2 V_{ov} + 2 V_{fs}$$

So if  $V_{th} < V_{fs}$  using SF will be better

### Conclusion:

As we intend to design an interleaved ADC, so we need to reduce the mismatch to the minimum probability, and also  $V_{th} < V_{fs}$  in our design specs, as consequence we choose the source follower topology to be the better candidate for our design.

#### 2.4.2.6 Circuit



Fig.2.4.7 Designed Circuit of THA



Fig2.4.8 Input, output of the THA with clock

#### 2.4.2.7 Simulations

Circuit has passed corners for  $\pm 10\%$  variation in the supply, temperature range from 0 to 85 degree.

#### 2.4.2.8 THA Specifications

|                            |                 |
|----------------------------|-----------------|
| Process                    | 65 nm           |
| Power supply               | 1 V             |
| Signal range $v_{in,pp}$   | 1.2 V           |
| Input common mode voltage  | 0.6 V           |
| Output common mode voltage | 0.3 V           |
| SFDR @ 1.25 GHz input      | 31 dB           |
| Supported Load             | 300fF           |
| Topology                   | Source follower |
| FOM_SFDR                   | 34.21 fJ        |
| Sampling frequency ( fs )  | 2.5 GHz         |
| Power consumption          | 2.44 mW         |

Table 2.4.2 Specification of THA

We also note that the common mode output of the SF is low which means that the next stages have to deal with low input common mode (i.e PMOS input devices).

### 2.4.3 Reference Ladder

An accurate design of a fully differential reference ladder is crucial to the correct functioning of the entire ADC system. The integration of reference ladders into the ADC system can be successful only if due consideration is given to the various errors that can act upon the design thereby generation improper voltage values, affecting the output of the converter. The reference generator used in flash ADCs usually consists of one or two chains of resistors.

Two chains are required when differential comparators are used. In the design the input and output signals are fully differential, hence only both chains are used. The reference ladder is primarily designed resistors to enable a stable voltage divisions across all comparators. However, due to mismatch between the various nets and components, the effective resistance of the resistors in the reference ladder tend to deviate from their typical values. As a result, the effective voltage output from the ladder also varies from their typical values. This variation of the reference voltages causes a nonlinear transfer function in the ADC system, resulting in multiple harmonics at its output. Another potential error is the signal feed-through of the input signal to the reference ladder outputs as shown in fig.2.4.9. This feed-through occurs due to the presence of parasitic capacitances present between the inputs of comparators. This feed-through can be reduced by designing the reference ladder to have a bias current sufficient enough to overcome the harmonics at the output resulting in a stable voltage level along the net. This results in the total resistance of the reference ladder being low enough to output a stable voltage, but not too low so as to draw extra power from the supply [26]



Fig. 2.4.9 feed-through between the input and reference value

Reference voltages as shown in fig.2.4.10 are chosen carefully with respect to the full scale input (i.e. dynamic range of the ADC) to divide this range into 15 region, ADC then utilizes an array of comparators (15 comparator) to rapidly decide in which region the input is.



Fig.2.3.10 Reference levels used in this work

The feed-through, mismatch and kick-back noise are clear if we take one reference level and inspect it as seen in fig 2.4.11, the max deviation from the main value is  $\pm 3.6 \text{ mV}$  this variability can be considered as offset for the next stage which is the Comparator.



Fig 2.4.11 Kickback noise and feed-through effect on the reference voltage

The ADC has Dynamic range (DR) of 1v and its least significant bit value is

$$LSB = \frac{FSR}{2^{BITS}} = \frac{1}{2^4} = \frac{1}{16} = 62.5 \text{ mV}$$

Then the difference between each two references values is 62.5 mV

|                                  |              |
|----------------------------------|--------------|
| Power                            | 16.83 mW     |
| Single resistance value          | 4 ohm        |
| Resistance Stack                 | 116.7 ohm    |
| Variability of a reference value | $\pm 3.6$ mV |

Table 2.4.3 Specification of Reference ladder

#### 2.4.4 Comparator Design

Comparator can be seen as 1 bit ADC, it acts as a one-bit quantizer. if the value of the input signal is higher than some threshold called reference comparator gives “1” (i.e. logic high) or else it gives “0”(i.e. logic low) , A N-bit flash ADC requires  $(2N-1)$  comparators, and for every additional bit, the number of required comparators becomes double which increase both power and area. Due to this fact flash ADCs are not suitable for high resolution. The outputs produced by comparators are not practical to be processed in the DSP, so an encoder is needed to produce binary output for further processing. Comparator is the main building block of The Flash ADC, any type of ADC must contain at least one comparator. This makes its design a corner stone in the ADC. The comparators should be able to resolve a small input signal to digital full swing signal it is in effect an amplifier with large gain, but without any requirement on the linearity of the amplification. This allows the use of positive feedback to realize the large gain, often through the use of cross-coupled inverter pairs.

The CMOS comparators can be implemented in many ways, the circuit topology often changing slightly between all publications, but two main types of comparators can be characterized being either the pre-amplifier and latch comparator, or the sense-amplifier based comparator. Comparator consists of two main blocks as shown in fig.2.4.11:

- i. Pre-Amplifier
- ii. Regenerative latch



Fig.2.4.11 Basic building of comparator

##### 2.4.4.1 Pre-Amplifier Design

The function of pre-amplifiers (PAs) in Flash ADCs is to amplify the voltage difference between the input signal and the reference voltage. They are used at the input of comparators to suppress large dynamic offsets and reduce the metastability errors [3], High gain of pre-amplifiers is needed to reduce bit error rate (BER) by reducing the offsets of comparators. The BW of PAs should also be high to improve the settling time and avoid distortions at high input signal frequencies. It is difficult to achieve high gain with high bandwidth in one stage so multiple stages of pre-amplifiers can be used to

increase the gain [27, 28]. In addition to comparator large dynamic offsets, PAs also have their own static offsets. To reduce offsets of PAs, large transistor sizes can be used. These large transistor sizes increase the input capacitance of PAs which in turn increase the load for THA circuit. Another efficient way of reducing the PA offset is to use averaging technique [29].

We used differential difference amplifier (DDF), in flash ADC, the location of zero crossing is very important because the comparators only detect zero crossing points. The digital output level of comparator depends on the output of PAs with respect to zero crossings. Due to the offsets of PAs, the zero crossing points shift by the amount of offset voltage and linearity of ADC is degraded. To remove these offsets, averaging technique is employed which correlates the output voltages of PAs and offsets are reduced [29]. Each amplifier represents one zero crossing and for four bit ADC, 15 zero crossing points are needed. This means that 15 amplifiers are needed in front of comparators. It is may be difficult for the track and hold circuit to drive a load of 15 PAs, so we have to take this into consideration when we design the THA (that is why we make its load as large as 300fF), input devices for PA should be large enough to reduce the offset but also small enough to avoid too much parasitic capacitances. The topology chosen for the pre-amp is PMOS DDF as shown in fig. 2.4.13, PMOS is used because common mode output of the THA is low as we used NMOS SF.



Fig.2.4.13 PMOS pre-amplifier

#### 2.4.4.2 Latch Design

After setting the pre-amplifier we use regenerative latch, it is optional to use static latch (i.e. SR latch) to hold the output, as we know two latches in cascade build a flip-flop. To design this flip-flop we decide to use a dynamic latch then a SR static LATCH.

Dynamic Latches are divided to two main architectures:

- CML latch
- Strong-Arm latch

As shown in fig.2.4.14 the CML latch can support high rates because of its ability to drive a lot of current through the circuit that is why it is not that much efficient in low power design trend. That is the reason for using Strong-Arm latch in this work to achieve low level of consumed power, Strong-Arm latch doesn't have bias transistor as shown in fig.2.4.15



Fig.2.4.14 CML Latch



Fig.2.4.15 Strong-Arm Latch

The second latch is the Static latch (i.e.SR latch) as shown in fig.2.4.16 it consists of two back to back inverters to form a positive feedback path for the outputs.



Fig.2.4.16 SR Latch

The output of the comparator (Pre-amplifier, Strong-Arm latch and SR latch) is corresponding to the value of signal with respect to the reference value (i.e. threshold value) as an example shown in fig. 2.4.17, it shows the existence of offset, Offset ranges between 2mv to 13mv according the reference level, kickback noise and mismatches between input devices, the offset is acceptable as long as it is less than the 1/2 LSB (i.e. 31 mV).

Table 2.4.4 Specification of comparators

|                           |           |
|---------------------------|-----------|
| Power                     | 31.3 mW   |
| Maximum systematic Offset | 13 mV     |
| Delay                     | 88.4 psec |



Fig.2.4.17 output of comparator

## 2.4.5 Encoder design

The thermometer-to-binary encoder logic is used to convert the thermometer code output of the comparators into a binary output. The thermometer-to-binary encoder logic is as shown in Figure 2.4.17. Encoder can be realized by different means, it could be all logic gates, ROM encoder or Mux based, it also may include bubble-suppressing logic which cancels the effect of bubbles when appearing near the '1'-to-'0' transition. However, the appearance of bubbles further away from the correct transition point requires more complex circuits to be corrected. In order to reduce the effect of these bubble errors the contents of the ROM can be grey coded. In this way, when two nearby ROM lines are simultaneously addressed the error would then be minimal.

In this work we directly take the output of comparators (i.e. thermometer codes) directly to decimation buffers (i.e. Demux) to reduce the bit rate from 2.5GHz to 1.25 GHz per line this to make it available to digital blocks (i.e. standard cells) to operate well. In the digital domain (i.e. DSP) thermometer codes are converted to gray first to reduce probability of existence of bubbles then converted to binary codes.



Fig. 2.4.17 thermometer-binary encoder

The full comparator architecture is shown in fig.2.4.18, bit rate at the output of decimator is 1.25 Gbps, output after that are forwarded to the DSP core where processing on it is done. By this way we break the 10 Gbps stream into eight 1.25Gbps streams.



Fig.2.4.18 full architecture of comparator with its timing plan

The full architecture of designed full Flash ADC is shown in fig.2.4.19



Fig. 2.4.19 the full architecture of Designed ADC

The extra CDR-Comparator is used for clock edge detection as will be explained in detail in the CDR part.

## 2.4.6 Flash ADC performance

Flash ADCs mainly target the high sampling rate applications that could be hard or impossible to reach with other architectures. Table 4-3 summarizes the performance of flash ADCs showing the high sampling rate achieved in both CMOS and BiCMOS processes. Based upon published results, the resolution of flash ADCs are typically limited to 5 bits but also higher resolutions have been achieved for full flash architectures as is seen in the same table.

| Author<br>Year | Sampling<br>frequency<br>(fs) | Process                  | Power<br>Dissipation<br>(p) |
|----------------|-------------------------------|--------------------------|-----------------------------|
| [6]<br>2009    | 35 GS/s                       | 180 nm<br>SiGe<br>BiCMOS | 4.5 W                       |
| [8]<br>2008    | 1.25 GS/s                     | 90 nm<br>CMOS            | 207 mW                      |
| [9]<br>2004    | 4 GS/s                        | 130 nm<br>CMOS           | 990 mW                      |
| [10]<br>2008   | 5 GS/s                        | 65 nm<br>CMOS            | 320 mW                      |
| [11]<br>2006   | 1.25 GS/s                     | 90 nm<br>CMOS            | 2.5 mW                      |
| [30]<br>2009   | 2.5 GS/s                      | 90 nm<br>CMOS            | 30 mW                       |
| [31]<br>2009   | 1.5 GS/s                      | 90 nm<br>CMOS            | 23 mW                       |
| This work      | 10 GS/s                       | 65 nm<br>CMOS            | 150 mW                      |

Table 2.4.5: Flash ADC literature work

# Chapter (5)

## Behavioral Modeling using Matlab Models

Before beginning the actual design of circuits of ADC, system level was built and tested by behavioral modeling for the ADC using MATLAB-Simulink with the aid of Analog – Mixed Signal Library supported by MathWorks®.

### 2.5.1 Modified MATLAB Model from AMS Library



Fig. 2.5.1 ADC model



Fig. 2.5.2 Stimulus model for the ADC with 1 Vpp, 1.25 GHz sin input



Fig. 2.5.3 Output spectrum showing SFDR = 39 dB for 1.25GHz input

## 2.5.2 Detailed MATLAB Model for single ADC

More practical and modeled behavioral showing and dissipating the blocks of the ADC is shown through the next figures



Fig 2.5.4 Stimulus model for the more detailed modeled ADC with 1 Vpp, 1.25 GHz sin input



Fig. 2.5.5 Internal blocks of the Full flash ADC



Fig. 2.5.6 Comparator array



Fig.2.5.7 Thermometer to gray to binary encode



Fig. 2.5.8 Model for 4 bit DAC



Fig. 2.5.9 Output spectrum of detailed model showing SFDR = 34.5 dB for 1.25GHz input

### 2.5.3 Detailed MATLAB Model for interleaved ADC

Next step was interleaving 4 of this ADC and check the response of them



Fig. 2.5.10 10Gbps Interleaved ADC model



Fig. 2.5.11 Output spectrum of the interleaved 10Gbps ADC showing SFDR = 31 dB for 4.99GHz input

## 2.5.4 Static performance for single ADC

This model was put under static performance test using Matlab code which gives the following results:



Fig. 2.5.12 Ramp input and quantized output of ADC vs. time



Fig. 2.5.13 Voltage transfer characteristics (output vs. input)



Fig. 2.5.14 Quantization noise vs. time



Fig. 2.5.15 DNL of ADC model



Fig.2.5.16 INL of ADC model

## 2.5.5 Dynamic performance for interleaved ADC

Interleaved ADC model was put under test for 4.99GHz input signal, utilizing Matlab code we can get various values for SNR, SINAD, SFDR, ENOB as shown in table 2.5.1



Fig. 2.5.17 Output spectrum of the interleaved 10Gbps ADC showing SFDR = 33.4 dB for 4.99GHz input

|          |             |
|----------|-------------|
| SNR_theo | = 25.84 dB  |
| SINAD    | = 21.73 dB  |
| SNR      | = 22.46 dB  |
| THD      | = -29.84 dB |
| SDR      | = 29.84 dB  |
| SFDR     | = 33.4 dB   |
| ENOB     | = 3.32 bits |

Table 2.5.1 Dynamic performance for interleaved ADC @ 4.99GHz input signal

## References:

- [1] Maxim, “<http://www.maxim-ic.com/app-notes/index.mvp/id/1023>”, accessed on October 21, 2011.
- [2] Maxim, “<http://www.maxim-ic.com/app-notes/index.mvp/id/810>”, accessed on October 21, 2011.
- [3] Robert C. Taft, Senior Member, IEEE, Chris A. Menkus, Member, IEEE, Maria Rosaria Tursi, Member, IEEE, Ols Hidri, Member, IEEE, and Valerie Pons, "A 1.8-V 1.6-GSample/s 8-b Self-Calibrating Folding ADC With 7.26 ENOB at Nyquist Frequency ,” IEEE Journal of Solid-State Circuits, VOL. 39, NO. 12, DECEMBER 2004.
- [4] Mikael Gustavsson, J. Jacob Wikner and Nianxiong Nick Tan, "CMOS Data Converters for Communications," ISBN 0-306-47305-4.
- [5] E-Hung Chen, Chih-Kong Ken Yang, “ADC-Based Serial I/O Receivers”, VOL. 57, NO. 9, SEPTEMBER 2010.
- [6] Mohammad Hekmat and Vikram Garg, "Design and Analysis of a Source-Follower Track-and-Hold Circuit," EE315 (VLSI DATA CONVERSION CIRCUITS) Project Report JUNE 2006.
- [7] Manoj Kumar and Ganesh Kumar, "Optimization techniques for source follower based track and hold circuit for high speed wireless communication," International Journal of VLSI design and Communication Systems (VLSICS) Vol.2, No.1, March 2011.
- [8] Wen-Lung Huang, Sing-Rong Li, and Yu-Wei Lin, "A Low Power 6-b ADC with 800-Ms/s Conversion Rate," Report for EECS598 ADC Final Project, Dec., 2002.
- [9] Hseyin Din, Student Member, IEEE, and Phillip E. Allen, Fellow, IEEE, "A 1.2 GSample/s Double-Switching CMOS THA With -62 dB THD," IEEE Journal of Solid-State Circuits, VOL. 44, NO. 3, MARCH 2009.
- [10] Gang Chen, Yifei Luo, Allen Drake, Kuan Zhou Electrical and Computer Engineering University of New Hampshire Durham, New Hampshire , "A 5-Bit 10GS/s 65nm Flash ADC with Feedthrough Cancellation Track-and-Hold Circuit”.
- [11] M. Gustavsson et al., CMOS Data Converters for Communications (Kluwer, Boston, MA, 2000)

- [12] N. Kurosawa, H. Kobayashi, K. Maruyama, H. Sugawara, K. Kobayashi, Explicit analysis of channel mismatch effects in time-interleaved ADC systems. in IEEE T. Circuits Syst. I 48(3), 261–271 (March 2001)
- [13] S.-W. Sin, U.-F. Chio, U. Seng-Pan, R.P. Martins, Statistical spectra and distortion analysis of time-interleaved sampling bandwidth mismatch. in IEEE T. Circuits Syst. II 55(7), 648–652 (July 2008)
- [14] S.-W. Sin, Timing-Jitter Effects in Time-Interleaved Sampled-Data Systems – Generalized Noise Analysis & Mismatch-Insensitive Clock Generation, Master Thesis, University of Macau, Macao SAR, China, 2003
- [15] Y.C. Jenq, Digital spectra of nonuniformly sampled signals: fundamentals and high-speed waveform digitizers. in IEEE Trans. Instrum. Meas. 37(2), 245–251 (June 1988)
- [16] T.-H. Tsai, P.J. Hurst, S.H. Lewis, Bandwidth mismatch and its correction in time-interleaved analog-to-digital converters. in IEEE T. Circuits Syst. II 53(10), 1133–1137 (Oct 2006)
- [17] C. Vogel, D. Draxelmayr, F. Kuttner, Compensation of timing mismatches in time-interleaved analog-to-digital converters through transfer characteristics tuning, in Proceedings of 2004 Midwest Symposium on Circuits and Systems, vol. 1 (July 2004), pp. I-341–344
- [18] Z. Liu, M. Furuta, S. Kawahito, Simultaneous compensation of RC mismatch and clock skew in time-interleaved S/H circuits. in IEICE Trans. Electron. E89-C(6), 710–716 (June 2006)
- [19] J.J.F. Rijns, H. Wallinga, Spectral analysis of double-sampling switched-capacitor filters. In IEEE T. Circuits Syst. 38(11), 1269–1279 (Nov 1991)
- [20] N. Kurosawa et al., Explicit analysis of channel mismatch effects in time-interleaved ADC systems. in IEEE T. Circuits Syst. I 48(3), 261–271 (March 2001)
- [21] M. Choi and A.A. Abidi, “A 6-b 1.3-Gsamples/s AD converter in 0.35- $\mu\text{m}$  CMOS” IEEE J. Solid-State Circuits, vol. 36, no. 12, pp. 1847–1857, Dec. 2001.
- [22] A. Ismail and M. Elmasry, “A 6-Bit 1.6-GS/s low-Power wideband flash ADC converter in 0.13- $\mu\text{m}$  CMOS technology,” IEEE J. Solid-State Circuits, vol. 43, no. 9, pp. 1982–1990, Sep. 2008.
- [23] K. Deguchi, N. Suwa, M. Ito, T. Kumamoto, “A 6-bit 3.5-GS/s 0.9-V 98-mW flash ADC in 90-nm CMOS,” IEEE J. Solid-State Circuits, vol. 43, no. 10, pp. 2303–2310, Oct. 2008.

- [24] A. M. Abo and P. R. Gray, "A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital converter," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 599 - 606, May 1999.
- [25] B. Razavi, *Design of Analog CMOS Integrated Circuits*. New York: McGraw-Hill, 2001, ISBN 0-07-118815-0.
- [26] M. Gustavsson, et al., *CMOS Data Converters for Communications*: Kluwer Academic Publishers, 2000.
- [27] Olli Viitala, Saska Lindfors and Kari Halonen Electronic Circuit Design Laboratory, Helsinki University of Technology, P.O.Box 3000, FIN-02015 TKK Finland , "A 5-bit 1-GS/s Flash-ADC in 0.13-um CMOS Using Active Interpolation ,".
- [28] Ying-Zu Lin, Cheng-Wu Lin, and Soon-Jyh Chang, "A 5-bit 3.2-GS/s Flash ADC With a Digital Offset Calibration Scheme," *IEEE Transactions On Very Large Scale Integration (VLSI) Systems*, VOL. 18, NO.3, MARCH 2010.
- [29] Pedro M. Figueiredo and Joao C. Vital, "Offset Reduction Techniques in High Speed Analog to Digital Converters," ISBN 978-1-4020-9715-7.
- [30] Timmy Sundström and Atila Alvandpour, "A 6-bit 2.5-GS/s Flash ADC using Comparator Redundancy for Low Power in 90nm CMOS, " accepted for publication in *Journal of Analog Integrated Circuits and Signal Processing*, Springer, August 2009.
- [31] Timmy Sundström and Atila Alvandpour, "Utilizing Process Variations for Reference Generation in a Flash ADC, " in *IEEE Transactions on Circuits and Systems—II: Express Briefs*, Vol. 56, No. 5, pp. 364-368, May 2009.
- [32] Ahmed Adel, Ahmed Arafa, Dina Reda, "Equalizer Implementation for 10 Gbps Serial Data Link in 90 nm CMOS Technology", *IEEE ICM* - December 2007.
- [33] C.H. Lee, M.T. Mustaffa, K.H. Chan "Comparison of Receiver Equalization Using First-Order and Second-Order Continuous-Time Linear Equalizer in 45 nm Process Technology" *ICIAS2012*, 2011 IEEE.
- [34] Wendemagegnehu (Wendem) T. Beyene Rambus Inc. "The Design of Continuous-Time Linear Equalizers Using Model Order Reduction Techniques", 2008 IEEE.
- [35] Ph. D. dissertation of Giorgio Spelgatti "CMOS CIRCUITS FOR SERIAL LINKS BEYOND 10Gb/s"

# **Sector 3**

## **PLL and Clock & Data recovery**

## **Contents:**

### **Chapter 1: PLL**

- 3.1.1 System overview.
- 3.1.2 Phase and frequency detector.
- 3.1.3 Charge pump and Loop filter.
- 3.1.4 Voltage controlled oscillator.
- 3.1.5 Frequency divider.

### **Chapter 2: Clock and Data Recovery**

- 3.2.1 Overview on the system.
- 3.2.2 Digital control block.
- 3.2.3 Phase interpolator.

# Chapter (1)

## Phase locked loop

### 3.1.1 System overview:

The phase locked loop is considered the main source of clocking of the whole system. The output clock of the PLL is fed to the transmitter, receiver and CDR

The required total jitter of the system according to the standard is 0.28UI which is equivalent to 27.15ps. So we have planned the PLL to have an rms jitter of 1ps in order to have minimum jitter from the PLL thus giving more tolerance to other blocks.

The PLL is a conventional 2<sup>nd</sup> order charge pump PLL as shown in the figure.



Figure 3.1.1 Conventional 2<sup>nd</sup> Order Charge Pump PLL

The transfer function of the system is given by:

$$A_\phi(s) = \frac{N(1 + s\tau_2)}{1 + s\tau_2 + \frac{s^2\tau_2}{K} + \frac{s^3\tau_2\tau_p}{K}}$$

where

$$k = \frac{b-1}{b} \frac{I_{cp} K_{VCO} R}{2\pi N}, \tau_2 = RC_2, \tau_p = R \frac{C_1 C_2}{C_1 + C_2} \text{ and } b = 1 + \frac{C_2}{C_1}$$

As seen from the equations and the block diagram, the transfer function acts as a low pass filter to the input and acts as a high pass filter to the VCO as the transfer function seen by the VCO is

$$A = \frac{1}{1+T(s)}$$

The response of phase locked loops to jitter is of extreme importance in most applications. There are two types of jitter (a) Slow jitter , and (b) Fast jitter. Two jitter phenomena in

phase locked loops are of great interest: (a) the input exhibits jitter , and (b) the VCO produces jitter . Slow jitter at the input propagates to the output unattenuated but fast jitter does not. We say the PLL low pass filter according to the input signal. Now suppose that the VCO suffers from jitter. The slow jitter components generated by the VCO are suppressed but fast jitter components are not .The following Figure conceptually summarizes the response of PLLs to input and VCO jitter. Depending on the application and the environment, one or both sources may be significant, requiring an optimum choice of the loop bandwidth.



Figure 3.1.2 The effect of the PLL's Bandwidth

The reference block is a crystal oscillator with frequency of 156.25MHz, the crystal oscillator has small contribution to the jitter as the phase noise of the crystal is -145.4dBc/Hz at 20KHz with a 10dB/decade roll off, which is a relatively small number.

The main contributors to the output phase noise are the reference clock and the VCO, since the reference clock has low phase noise, the main concern is on the VCO phase noise, so the aim of the system is to have the maximum achievable BW in order to suppress most of the phase noise generated by the VCO.

In order to avoid instabilities the maximum achievable gain cannot exceed  $f_{ref}/10$  which is equivalent to 15.6MHz, however to achieve this gain with a phase margin of 60degrees this will require very large resistances in the loop filter. To find these values we used the equations in AN1001, by Texas instruments[1], on a MATLAB code, then substitute the acquired values into the transfer function and plot it to verify the results as shown in the following figures and table. After running various simulations the most appropriate values where found BW=7MHz.



Figure 3.1.3 Obtaining value of R



Figure 3.1.4 Obtaining value of C2



Figure 3.1.5 Obtaining value of C1



Figure 3.1.6 Open loop response of the system



Figure 3.1.7 Closed loop response of the system

The values of the loop parameters are given in the following table

|                       |            |
|-----------------------|------------|
| C1                    | 0.259pF    |
| C2                    | 3.58pF     |
| R                     | 25KΩ       |
| Icp                   | 150μA      |
| K <sub>vco</sub>      | 800MHz/V   |
| BW                    | 7MHz       |
| Phase margin          | 60 degrees |
| Estimated rms Jitter* | 47fs       |

\*The estimated rms jitter is based on the results from the phase noise profile of the VCO and then running these values on CppSim to find the estimated jitter.

Phase domain Verilog modules were built to verify the results of the Matlab codes and the results were identical:



Figure 3.1.8 Open loop



Figure 3.1.9 Closed loop

### 3.1.2 Phase and Frequency Detector:

#### 3.1.2.1 Introduction

Phase detector or PFD compares the divided-down signal with the reference signal and provides an error voltage, which is ultimately fed back to control the oscillator. The average output,  $\bar{V}_{out}$ , is linearly proportional to the phase difference,  $\Delta\phi$ , between its two inputs as shown in Figure 1. In the ideal case, the relationship between  $\bar{V}_{out}$  and  $\Delta\phi$  is linear, crossing the origin for  $\Delta\phi = 0$ . Called the gain of the PD. The slope of the line,  $K_{PD}$ , is expressed in  $V/rad$ [1]. As the phase difference between the inputs varies, so does the width of the output pulses, thereby providing a dc level proportional to  $\Delta\phi$ . The operation of phase detectors is similar to that of differential amplifiers in that both sense the difference between the two inputs, generating a proportional output.



Figure 3.1.10 phase difference between two

#### 3.1.2.2 Phase detector mathematical relationship

A phase detector is nothing more than a simple analog multiplier, which is a mixer. Consequently we mix two signals to show the resulting relationship [2]. First, describing these two signals mathematically. If we the input as two sinusoidal signals Equation 1 and 2 mathematically describe the general signal inputs to the mixer:

$$V_1(t) = V_{p1} \cos(w_{rf}t + \theta_e) \quad (1)$$

$$V_2(t) = V_{p2} \cos(w_{lo}t) \quad (2)$$

Mixing (analog multiplication) (1) and (2) produces (3)

$$V_1(t) = V_{p1}V_{p2} \cos(w_{rf}t + \theta_e) \cos(w_{lo}t) \quad (3)$$

Using the trigonometric identity for products of a trigonometric function produces (4):

$$V_1(t) = V_{p1}V_{p2}0.5 [\cos(w_{rf}t - w_{lo}t + \theta_e) + \cos(w_{rf}t + w_{lo}t + \theta_e)] \quad (4)$$

Eliminating the high-frequency product with a lowpass filter yields (5)

$$V_1(t)V_2(t) = V_{p1}V_{p2}0.5 \cos(w_{rf}t - w_{lo}t + \theta_e) = V_{pbeat} \cos(w_{beat}t + \theta_e) \quad (5)$$

The derivative of (5) calculates the incremental phase slope. For the mixer operating with a 0-beat frequency ( $w_{beat} = 0$ , which is dc) and  $90^\circ$  phase shift, the derivative of (5) produces (6)

$$V_{pbs} = \frac{d}{d\theta_e} V_{pbeat} \sin(\theta_e) \quad (6)$$

For a phase error  $\theta_e$  equal  $\pi/2$  rad (quadrature) in (6), the phase slope  $V_{pbs}$  equals the peak resulting voltage  $V_{pbeat}$  and the gain of the phase detector (V/rad),  $K_d = V_{pbeat}$ . For  $\theta_e$  equal to 0 rad in (6),  $K_d$  equals 0 V/rad. This shows that maximum phase sensitivity occurs for a  $90^\circ$  phase difference between the input signals, while a minimum phase sensitivity of 0 occurs for a phase difference of  $0^\circ$ . With a 0-Hz beat frequency in (5), adjusting the phase to  $90^\circ$  phase difference produces 0V at the intermediate frequency (IF) port of the mixer and gives maximum phase sensitivity for a measurement. Adjusting the phase to  $0^\circ$  phase difference produces a maximum voltage and gives minimum phase sensitivity for a measurement. The terms LO, RF, and IF are from receiver terminology, where the LO is the oscillator in the receiver, the RF is the signal from the antenna, and the IF is the down-converted frequency received, which, in this case, is baseband (dc).



Figure 3.1.11 phase and frequency detector response with (a)  $\omega_A < \omega_B$  (b) A lagging B.

Illustrated in Figure 2, the operation of a typical PFD is as follows. If the frequency of input A,  $\omega_A$ , is less than that of input B,  $\omega_B$ , then the PFD produces positive pulses at  $Q_A$ ,

while  $Q_B$  remains at zero. Conversely, if  $\omega_A > \omega_B$ , then positive pulses appear at  $Q_B$  while  $Q_A = 0$ . If  $\omega_A = \omega_B$ , then the circuit generates pulses at either  $Q_A$  or  $Q_B$  with a width equal to the phase difference between the two inputs. (Note that, in principle,  $Q_A$  and  $Q_B$  are never high simultaneously.) Thus, the average value of  $Q_A - Q_B$  is an indication of the frequency or phase difference between A and B. The outputs  $Q_A$  and  $Q_B$  are usually called the "UP" and "DOWN" signals [3] [4].

To arrive at a circuit with the above behavior, there must be at least three logical states:  $Q_A = Q_B = 0$ ;  $Q_A = 0, Q_B = 1$ ; and  $Q_A = 1, Q_B = 0$ . It actually has four states, but the fourth state is simply a reset state. Even though the tri-state phase detector is the most common phase detector in use today, there are many variations possible. Also, to avoid dependence of the output on the duty cycle of the inputs, the circuit should be implemented as an edge-triggered sequential machine. The assumption is that the circuit can change state only on the rising transitions of A and B. Figure 3 shows a state diagram summarizing the operation. If the PFD is in the "ground" state,  $Q_A = Q_B = 0$ , then a transition on A takes it to State I, where  $Q_A = 1, Q_B = 0$ . The circuit remains in this state until a transition occurs on B, upon which the PFD returns to State 0. The switching sequence between States 0 and II is similar.



Figure 3.1.12 PFD state diagram.

An important point in this state diagram is that if, for example,  $\omega_A > \omega_B$ , then there will be a time interval during which two transitions of A take place between two transitions of B. This ensures that, even if the PFD begins in State II, it will eventually leave that state and thereafter toggle between States 0 and I.

The PFD characteristic is shown in Figure 3.1.13 [5].



Figure 3.1.13 plot of the averaged PFD output signal  $\bar{u}_d$  vs. phase error  $\theta_e$ .  $\bar{u}_d$  does not depend on the duty cycle of  $u_1$  and  $u_2$ .

### 3.1.2.3 Circuit implementation

#### a- Topology 1

The implementation of the above PFD is shown in Figure 5 the circuit consists of two edge triggered, resettable D flip-flops with their D inputs connected to logical ONE. Signals A and B act as clock inputs of DFF and DFF<sub>B</sub>, respectively. If  $Q_A = Q_B = 0$ , a transition on A causes  $Q_A$  to go high. Subsequent transitions on A have no effect on  $Q_A$  and when B goes high, the AND gate activates the reset of both flipflops. Thus,  $Q_A$  and  $Q_B$  are simultaneously high for a duration given by the total delay through the AND gate and the reset path of the flipflops.



Figure 3.114 PFD implementation.

PFDs are usually constructed using basic digital cells. The implemented PFD in the design of the phase locked-loop (PLL) is shown in Figure 3.1.14.

The reference (REF) and the output of the divider (FDBK) are each fed into a clock input of one of the flipflops. A rising edge of either the reference or the output immediately causes the corresponding flipflop's output to go high. Once the other output is also high, the NOR gate produces a zero, which resets both flipflops to the zero state until the next rising edge of either input arrives. Figure 7 shows the output signals (UP) and (DN) which are plotted when FDBK is leading in case (a) and the REF signal is leading in case (b).



Figure 3.1.15 Topology 1 of PFD Circuit.



Figure 3.1.16 the output UP and DN signals with phase error of 500 ps (a) FDBK is leading REF (b) REF is leading FDBK.

Table I: the characteristic of first topology

| Number of Transistors | Power  |
|-----------------------|--------|
| 44                    | 99.18u |

### b- Topology 2



Figure 3.1.17 Topology 2 of PFD Circuit.

| Number of Transistors | Power |
|-----------------------|-------|
| 38                    | 90.8u |

#### 3.1.2.4 Non-ideal Effect in PFD

##### a- Dead Zone in PFDs

Several imperfections in the PFD/CP circuit lead to high ripple on the control voltage even when the loop is locked [1]. The PFD implementation of conventional PFD generates narrow, coincident pulses on both  $Q_A$  and  $Q_B$  even when the input phase difference is zero. For a small phase difference between the phase detector controlling signals, narrow pulses are required at the output. As illustrated in Figure 3.1.18, if A and B rise simultaneously, so do  $Q_A$  and  $Q_B$ , thereby activating the reset. That is, even when PLL is locked,  $Q_A$  and  $Q_B$ , simultaneously turn on the charge pump for a finite period do  $T_p \approx 10 T_D$ , where do  $T_D$  denotes the gate delay.



Figure 3.1.18 coincident pulses generated by PFD with zero phase difference.

What are the consequences of the reset pulses on do  $Q_A$  and  $Q_B$ ? To understand why these pulses are desirable, we consider a hypothetical PFD that produces no pulses for zero input phase difference [Figure 3.1.19(a)]. How does such a PFD respond to a small phase error? As shown in [Figure 3.1.19(b)], the circuit generates very narrow pulses on  $Q_A$  or  $Q_B$ .



Figure 3.1.19 output waveforms of a hypothetical PD with (a) zero input phase difference and (b) a small input phase difference.

However, owing to finite rise time and fall time resulting from the capacitance seen at these nodes, the pulse may not find enough time to reach a logical high level, failing to turn on the charge pump switches. In other words, if the input phase difference,  $\Delta\phi$ , falls below a certain value of  $\Delta\phi_0$ , then the output voltage of the PFD/CP/LPF combination is no longer a function of  $\Delta\phi$ . Since, as depicted in Figure 3.1.20 for  $|\Delta\phi| < \Delta\phi_0$  the



Figure 3.1.20 Dead zone in the charge pump current

Charge pump injects no current, and the loop gain drops to zero and the output phase is not locked. We say the PFD/CP circuit suffers from a dead zone equal to  $\pm \Delta\phi_0$  around  $\Delta\phi = 0$ .

The dead zone is highly undesirable because it allows the VCO to accumulate as much random phase error as  $\Delta\phi_0$  with respect to the input while receiving no corrective feedback. Thus, as illustrated in Figure 3.1.21, an infinitesimal increment in the phase difference results in a proportional increase in the net current produced by the charge pump. In other words, the dead zone vanishes if  $T_P$  is long enough to follow  $Q_A$  and  $Q_B$  to reach a valid logical level and turn on the switches in the charge pump.



Figure 3.1.21 Coincident pulses generated by PFD with zero phase difference

The region of dead zone occurs primarily due to a difference in rise time between the latch outputs and the reset path delay. To illustrate this point, suppose that the flipflops in a PFD have a rise time of  $t$ , as shown in Figure 3.1.22. Further, suppose that in order to turn on the charge pump, the output must reach a full logic level. Now, also suppose that the NOR gate's threshold voltage is one-half of a logic level, and it resets the flipflops much faster. The result is that the NOR gate will reset the flip-flops at time  $t/2$  after the second pulse starts to rise. If the pulses are nearly in phase, this means that the charge pump will never be turned on, as illustrated in Figure 3.1.23 [4].



Figure 3.1.22 A PFD with finite rise time but large difference in phase between

Thus, unless the time difference in the arrival of the pulses from the reference and the output is greater than  $t/2$ , the charge pump and PFD will remain inactive. This is called the dead zone, and for these small differences in phase, the loop is open, and no feedback will take place. Thus, the loop will only respond to differences in phase greater than

$$\text{Dead zone edge} = \pm \frac{t\pi}{T}$$

Where  $T$  is the reference period. Thus, the dead zone will increase with a higher reference frequency or with an increased delay in the output.



Figure 3.1.23 A PFD with rise time and small difference in phase between inputs.

The dead zone is undesirable in a phase-locked system: If the phase difference between the input and output varies within the zone, the dc output of the charge pump does not change significantly and the loop fails to correct the resulting error. Consequently, a peak-to peak jitter approximately equal to the width of the dead zone can arise in the output. One way to combat the dead zone is to add delay into the feedback path. If such delay is added to ensure that the time for reset is comparable to the delay in the forward

path, then the dead-zone problem can be made less severe. For example, assume that delay is added in the feedback path so that the time for the AND or NOR gate to reset the flipflops is also  $t$ . Now assume that the phase difference between input and output is  $\Delta$ , and that  $\Delta$  is less than  $t/2$ . Then, with an equal delay in the reset path, the output will reach a one level, even for very small differences in phase, as shown in Figure 3.1.24. Adding further delay into the feedback path will cause both current sources to be on simultaneously, which is undesirable from a noise, spur, and power-consumption point of view, but does remove the dead zone problem.



Figure 3.1.24 Plot of the output with delay inserted into the reset path.

### b- The Continuous Time Approximation

Technically, the phase and frequency detector puts out a pulse width modulated signal and not a continuous current. However, it greatly simplifies calculations to approximate the charge pump current as a continuous current with a magnitude equal to the time-averaged value of these currents from the charge pump. This approximation is referred to as the continuous time approximation and is a good approximation provided that the loop bandwidth is no more than about one-tenth of the comparison frequency. This approximation loses accuracy as the loop bandwidth approaches the comparison frequency. Despite this fact, this approximation holds very well in most cases and is used in order to derive the transfer functions that are necessary to analyze the PLL system. The discrete sampling effects that are not accounted for in the continuous time approximation introduce minor errors in the calculation of many performance criteria, such as the spurs, phase noise, and the transient response [6].

The charge pump mainly is composed of two current sources with two switches that control which current source works as shown in figure 1.3.2. The value of the current with which the loop filter is charged or discharged is determined by the stability equations of the PLL loop.

If the feedback signal from the divider lags the reference clock signal, the “Up” output signal of the PFD is high (“Up\_bar” is used for the PMOS switches) for a certain period, equal to the phase difference between the two signals. This causes the switch of the upper branch of the charge pump to be ON, thus charging the loop filter. This increases the control voltage to the VCO, causing the output frequency from the VCO to increase.

If the feedback signal leads the reference clock signal, the “Down” signal of the PFD becomes high for a period equal to the phase difference between the two signals. The switch of the lower branch turns ON, causing the discharge of the loop filter & the decrease of the control voltage, thus decreasing the output frequency from the VCO.



Figure 3.1.25 Charge Pump



Figure 3.1.26 Loop Filter



Figure 3.1.27 The  $V_{out}$  of the charge pump

The loop filter used is a 2<sup>nd</sup> order low pass filter (shown in figure 3.1.26) that is used to pass an almost constant voltage to the VCO & turning the current from the charge pump into a voltage for the VCO. The parameters of the Loop Filter are chosen according to the stability of the PLL loop also.

The values of the current of the sources & the loop filter parameters are shown in the following table:

|                 |                   |
|-----------------|-------------------|
| I up (& I down) | 150 micro Amperes |
| R               | 25 K Ohms         |
| C1              | 0.259 pico Farads |
| C2              | 3.58 pico Farads  |

### 3.1.3 Design issues in Charge Pump:

When designing the charge pump & choosing the suitable topology, several issues should be considered:

a) **Current Mismatch:**

Due to the fact that current sources do not operate ideally (do not maintain a constant value) for the whole range when the V control changes from rail to rail, the current of the upper current source & the lower current source differ in value when the V control gets closer to any of the rails. The current mismatch can be controlled by changing the drain-source saturation voltage of the transistors of the current source & the channel lengths of the transistors used as current sources.

By decreasing the drain-source saturation voltage  $V_{DS}$ , The more the charge pump can operate properly closer to the rails, as the transistors have a wider range to operate in the saturation mode where the current is almost constant, according to the following relation:

$$I_D = \frac{1}{2} (\mu C_{ox}) \left(\frac{W}{L}\right) (V_{GS} - V_T)^2$$

$$V_{DS,SAT} = V_{GS} - V_T = \sqrt{\frac{2 I_D}{\mu C_{ox}} \left(\frac{L}{W}\right)}$$



Figure 3.1.28 Current source output currents for a charge pump made with transistors that have a finite output impedance

The other factor affecting the current mismatch is the channel length of the transistors. Actual current sources have output impedances that cause the current in the saturation mode to have a slope instead of being constant. This adds a factor in the current's relation which is \$\lambda\$, the channel-length modulation parameter. The output impedance \$R\_{DS}\$ is inversely proportional to the channel-length modulation parameter, & the channel-length modulation parameter is inversely proportional to the channel length. So by increasing the length, \$R\_{DS}\$ increases, so the slope of the current in the saturation modes decreasing, causing the current value in the saturation mode more constant.

$$I_D = \frac{1}{2} (\mu C_{ox}) \left(\frac{W}{L}\right) (V_{GS} - V_T)^2 (1 + \lambda(V_{DS} - V_{DS,SAT}))$$

$$R_{DS} = \frac{1}{\lambda I_D} \alpha \frac{L}{ID}$$



Figure 3.1.29 The effect of doubling the channel length

### b) Charge injection:

When a MOS transistor acting as a switch suddenly turns off (the gate voltage swings quickly from one supply rail to the other), the charges in the transistor's inversion layer is forced to leave the channel to the data node. Thus we can say that any electrons leaving through the source and drain will result in charge injection errors. Charge injection causes the control voltage to change, even after the switches are turned off.



Figure 3.1.30 Showing the charge injection in an NMOS switch

### c) Charge sharing:

In the shown example in Figure 3.1.31, when the switches S1 & S2 are off, the voltage nodes X & Y are pulled up or down to the rails due to the presence of the parasitic capacitances. The PMOS current source discharges to VDD & the NMOS current source discharges to ground. When the switches turn ON, the charges are distributed between the parasitic capacitances and the loop filter, causing some transients on the control voltage, which can lead to reference feedthrough.



Figure 3.1.31 Showing the effect of charge sharing on the control voltage

### d) Reference clock feedthrough:

The reference clock feedthrough is a phenomenon that causes an ac signal on the control voltage at reference frequency. It occurs during the switching of the clock signal at the gate of the switches, leading to a change in the drain/source voltage through  $C_{gd}$  and  $C_{gs}$  coupling. This can cause spurs in the PLL's output spectrum at frequencies equal to ( $f_{out}$

$\pm n f_{ref}$ ), as shown in the figure 3.1.33. Reference clock feedthrough is mainly caused by current mismatch, charge injection and charge sharing.



Figure 3.1.32 The effect of current mismatch on the control voltage (assuming here zero phase difference between the reference & feedback signals)



Figure 3.1.33 Spurs in the output spectrum of the PLL loop

### 3.1.3.1 Topologies & Simulations:

#### a) First Topology:

The first topology that was tried is shown in figure 3.1.34. At first, we tried to get the compliance curve that shows the range in which the charge pump will operate properly (with almost no current mismatch). This is achieved by adjusting the  $V_{DS,SAT}$  (adjusting the  $V_{GS}$ ) of the current sources & current mirrors ( $M_6, M_7$ ).

The transistors used as the current sources in the main branches of the charge pump,  $M_1$  &  $M_4$  are sized in order to get the current value needed in the saturation mode, which is  $150 \mu\text{A}$ . The channel length is increased in order to get an almost flat curve in the saturation mode range. Also increasing the  $L$  of the current sources over 2 or 3 times the minimum length helps the circuit pass the corners' analysis.

The switches are also sized to act more as ideal switches. The channel length is left minimum sized, but the width is increased to decrease  $R_{ON}$ , as it is inversely proportional to the aspect ratio ( $\frac{W}{L}$ ), thus letting the switches take almost no headroom.



Figure 3.1.34 First Topology

From the DC setup in figure 3.1.35, we sweep on the output voltage from 0 to 1v (VDD) & plot it with the output current to get the compliance curve to show the range in which the charge pump operates with little current mismatch. The compliance curve is shown in figure 3.1.36. Here the range achieved for mismatch of less than 10% is between 0.13v & 0.86v. This is the range that the VCO also must operate properly in.



Figure 3.1.35 The setup for DC analysis to get the compliance curve in the first topology



Figure 3.1.36 The  $I_D$  VS  $V_{DS}$  for the NMOS & PMOS current sources in the first topology



Figure 3.1.37 The charge pump's compliance curve in the first topology

The transient analysis is done by joining the PFD with the charge pump in order to see how they operate together. The setup is shown in figure 3.1.38. We can see from figures 3.1.39 & 3.1.40 that the charge sharing affects the V control dramatically and there is some charge injection that cause the V control to change while the switches are OFF.



Figure 3.1.38 The transient setup of the first topology with the loop filter and the PFD



Figure 3.1.39 The transient waveforms of the V control with the Down signal decreasing it, in the first topology



Figure 3.1.40 The transient waveforms of the V control with the up\_bar signal increasing it, in the first topology

### b) Second Topology:

In the second topology, the switches are placed near the rails away from the output node. This is to remove the effect of charge sharing & prevent any switching errors from reaching the output voltage. Charge injection is also reduced because the charges are injected to the rails, in addition to the presence of the transistors Mx & My that decreases both charge injection & clock feedthrough.

The sizing of the transistors is done the same way as the first topology. In the DC analysis, we will be taking another two terminals from the PFD for the up & down\_bar signals, as shown in figure 1.3.19. The compliance curve is slightly different, where the range is from 0.12v till 0.85v.

The difference appears in the transient analysis, where the charge sharing is much less than that in the first topology, but charge injection still exists. This can be seen in figures 3.1.46 & 3.1.47.



Figure 3.1.41 Second Topology



Figure 3.1.42 The setup for DC analysis to get the compliance curve in the second topology



Figure 3.1.43 The  $I_D$  VS  $V_{DS}$  for the NMOS & PMOS current sources in the second topology



Figure 3.1.44 The charge pump's compliance curve in the first topology



Figure 3.1.45 The transient setup of the second topology with the loop filter and the PFD



Figure 3.1.46 The transient waveforms of the V control with the down signal decreasing it, in the second topology



Figure 3.1.47 The transient waveforms of the V control with the up\_bar signal increasing it, in the second topology

### c) Third Topology:

In the third topology, the current sources are kept on always & the  $V_{DS}$  is kept constant using a unity gain buffer. This decreases mismatch & charge sharing.



Figure 3.1.48 Third Topology

The unity-gain buffer is designed using a differential pair with active loads, that is designed to have a great gain. The buffer almost gives a unity gain for most of the operating range, as shown in figure 3.1.49.



Figure 3.1.49 The Buffer



Figure 3.1.50 Plotting the input & output voltages of the buffer

The compliance curve is as same as the first topology. From the transient analysis, we can see that the charge sharing is reduced, but charge injection is great.



Figure 3.1.51 The transient waveforms of the V control with the down signal decreasing it, in the third topology



Figure 3.1.52 The transient waveforms of the V control with the up\_bar signal increasing it, in the third topology

### 3.1.3.2 Comparison:

|                                              | <b>Topology 1</b> | <b>Topology 2</b> | <b>Topology 3</b> |
|----------------------------------------------|-------------------|-------------------|-------------------|
| <b>Power Supply</b>                          | 1v                | 1v                | 1v                |
| <b>Power<br/>(when both switches are on)</b> | 204.9 uW          | 198.42 uW         | 861.93 uW         |
| <b>Phase noise<br/>(at 10 Mhz offset)</b>    | -140.56 dBc/Hz    | -113.01 dBc/Hz    | -120.67 dBc/Hz    |
| <b>Charge Sharing</b>                        | Not Decreased     | Decreased         | Decreased         |
| <b>Charge Injection</b>                      | Not Decreased     | Decreased         | Not Decreased     |
| <b>Compliance Range</b>                      | 0.13v – 0.86v     | 0.12v - 0.85v     | 0.13v – 0.86v     |

The second topology is chosen because it decreases both charge sharing & charge injection. Also it was found that it has the least static power consumption.

### 3.1.3.3 Phase noise profile:

The phase noise is calculated for the PFD, the charge pump & the loop filter combined together. Figure 3.1.53 shows the contribution of the PFD, charge pump & the loop filter in the PLL's phase noise.



Figure 3.1.53 Phase Noise due to PFD, Charge Pump & Loop Filter

### 3.1.3.4 Corners Analysis:

Corners Analysis is performed on temperatures 0, 27 & 85 degrees Celsius. It is also performed at voltage supplies 0.9, 1 & 1.1 volts.

#### (a) Typical-Typical:



Figure 3.1.54 Compliance curve of topology 2 in typical-typical



Figure 3.1.55 Phase noise profile in typical-typical

**(b) Slow-Slow:**



Figure 3.1.56 Compliance curve of topology 2 in slow-slow



Figure 3.1.57 Phase noise profile in slow-slow

**(c) Fast-Fast:**



Figure 3.1.58 Compliance curve of topology 2 in fast-fast



Figure 3.1.59 Phase noise profile in fast-fast

### 3.1.4 Voltage Controlled Oscillator:

#### 3.1.4.1 Introduction

An oscillator is a circuit that generates a periodic waveform. Oscillators have numerous applications, from serving as reference tone generators for receivers to clocks for digital circuits.

An ideal VCO is a circuit that generates a periodic output whose frequency is a linear function of a control voltage.

$$\omega_{out} = \omega_{FR} + K_{VCO} V_{cont}$$

Where

$\omega_{FR}$  is the free running frequency

$\omega_{out}$  is the output frequency

$K_{VCO}$  is the oscillator gain

$V_{cont}$  is the control voltage

Oscillators are considered a positive feedback circuit, the *Barkhausen criterion* must be satisfied for the circuit to operate.

Barkhausen Criterion:

$$|H(j\omega_o)| \geq 1 \quad , \quad \angle H(j\omega_o) = 180^\circ$$



Figure 3.1.60 Barkhausen criterion

### 3.1.4.2 Oscillator topologies:

There are two main topologies for VCO design:

#### 3.1.4.2.1 Ring Oscillator:



Figure 3.1.61 Ring oscillator

A ring oscillator is usually made up of an odd number of inverters or delay cells with the output fed back to the input, as shown in the figure.

The main advantage of this topology is the significant reduction of area as there are no inductors or varactors in the circuit, also the tuning range of the ring oscillator is relatively large, another important advantage is that we can get different phases from the oscillators without the need for external circuitry.

However the major drawback of this topology is the relatively large phase noise.

#### 3.1.4.2.2 LC VCO:

The LC based VCO has nearly the opposite features of the ring oscillators, the tuning range of the LC VCO is relatively small, the layout area is large due to the presence of inductors and varactors in the circuit, the tuning range of the LC VCO is limited which may require dividing the tuning range with switched capacitors to increase it, however the LC VCO shows much better performance regarding the phase noise.

The following table summarizes the tradeoffs between the two oscillators:

|                   | Ring Oscillator                                | LC Oscillator                  |
|-------------------|------------------------------------------------|--------------------------------|
| Passive Devices   | No Passive devices so it's easier to implement | Spiral inductors and varactors |
| Area              | Smaller area                                   | Larger area                    |
| Tuning Range      | Large tuning range                             | Smaller tuning range           |
| Frequency         | High frequency                                 | Much higher frequencies        |
| Phase noise       | Poor phase noise performance                   | Better phase noise performance |
| Power Consumption | Higher power consumption                       | Lower power consumption        |

Since the required output jitter from the PLL is 1ps rms, we have chosen to work with the LC VCO in order to achieve the lowest possible phase noise from the oscillator.

### 3.1.4.2.2 -Gm LC Oscillator:



Figure 3.1.62 -Gm oscillator

LC oscillators has various topologies; Colpitts and -Gm , We have chosen -Gm because for the same quality factor, Colpitts oscillator requires a higher gain than that needed from the -Gm oscillator, and since in most CMOS technologies the Q factor is low it is preferable to use -Gm oscillator.

-Gm oscillator can be viewed far from a feedback system as a one port network with an LC tank and a negative resistance to compensate the losses from the tank thus oscillation can occur.



Figure 3.1.63 -gm LC Oscillator

The negative resistance can be achieved by using a cross-coupled transistor pair as shown in the figure.

Based on some surveys on various designs we made a comparison between different topologies:

|                   | NMOS oscillator                                                                   | PMOS oscillator                                                                    | CMOS oscillator                                                                     |
|-------------------|-----------------------------------------------------------------------------------|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| Power consumption | Lower power consumption                                                           | High power consumption                                                             | Lowest power consumption                                                            |
| Phase noise       | Worst performance                                                                 | Low phase noise                                                                    | Low phase noise                                                                     |
| Schematic         |  |  |  |

It is clear from this comparison that the best choice is the CMOS oscillator topology.

### Current tail problem:

The first design of the circuit was made based on the conventional CMOS shown in the previous figures, after running simulations on the design the following results were obtained.



Figure 3.1.64 Frequency vs. Voltage



Figure 3.1.65 Phase noise

According to these results the phase noise at offset 1MHz could not go below -105dBc/Hz which is relatively large for this type of oscillators. After reviewing the noise report from the simulator, we observed that a significant portion of the noise came from the current tail, so we tried removing the current tail from the circuit as shown in the figure, we noticed that the phase noise has dropped significantly to -112dBc/Hz at offset 1MHz.

A comparison between the performance of the oscillator using the current tail and without using it:

|                     | Without Current tail | With Current tail |
|---------------------|----------------------|-------------------|
| Average DC current  | 3.7mA                | 3.09mA            |
| Power consumption   | 3.7mW                | 4.09mW            |
| Current peaking     | 2.5:5.8 mA           | -4.135:8.577mA    |
| Kvco                | 800MHz/V             | 550MHz/V          |
| Phase noise at 1MHz | -112dBc/Hz           | -105.2dBc/Hz      |

It is clear from the comparison that the power consumption with the presence of the current tail is larger than without the current tail in addition the improvement of phase noise profile when removing the current source, so we decided to remove the source in order to improve the performance of the oscillator.



Figure 3.1.66 Oscillator without current source

The following plots show the simulation results of the oscillator after settling on the appropriate design:



Figure 3.1.67 V-f Plot



Figure 3.1.68 Phase noise profile

Corners analysis is shown in the following figures:



Figure 3.1.69 Typical-typical V-F plot



Figure 3.1.70 Fast slow V-F plot



Figure 3.1.71 Slow-fast V-F plot



Figure 3.1.72 Fast fast V-f plot



Figure 3.1.73 Slow-slow V- F plot

It is clear from the previous curves that across PVT variations the required frequency 10.3125GHz remains within the range of the VCO.

Phase noise performance over corners is shown in the following figures:



Figure 3.1.74 Typical-typical phase noise profile



Figure 3.1.75 Fast slow phase noise profile



Figure 3.1.76 Slow fast phase noise profile



Figure 3.1.77 Slow slow phase noise profile



Figure 3.1.78 Fast fast phase noise profile

It is noticed that the phase noise has not exceeded -110dBc/Hz across corners, this value was used in system analysis using CppSim to estimate the output jitter from the PLL, the rms output estimated jitter is 47fs.

The following table shows the properties of the circuit components:

|                   |                |                  |                 |             |                    |
|-------------------|----------------|------------------|-----------------|-------------|--------------------|
| NMOS              | 105μm / 360nm  |                  |                 |             |                    |
| PMOS              | 105μm/300nm    |                  |                 |             |                    |
| Capacitance range | 1.23pF – 1.5pF |                  |                 |             |                    |
| Inductance        | Width<br>= 9μm | Radius<br>= 40μm | Turns<br>= 1    | L = 186.5pH | Q Factor<br>= 20.6 |
|                   | Area<br>153μm  | width<br>=       | Area<br>147.5μm | length<br>= |                    |

### 3.1.5 Divider:

#### 3.1.5.1 Introduction:

A frequency divider, also called a clock divider or scalar or prescaler, is a circuit that takes an input signal of a frequency,  $f_{in}$ , and generates an output signal of a frequency:

$$f_{out} = \frac{f_{in}}{n}$$

where  $n$  is an integer. Phase-locked loop frequency synthesizers make use of frequency dividers to generate a frequency that is a multiple of a reference frequency.



Figure 3.1.79 The Divider in the PLL loop

Although both digital and analog implementations are possible, the digital dividers are more versatile and commonly used. They are easily programmable, allowing large and variable division coefficients.

### 3.1.5.2 Digital dividers :

The simplest digital divider is built using a D-type flip-flop as shown in figure 18-1. The flip-flop changes its output state with every rising edge of the incoming pulses, thus, providing a divide-by-2 function. Such a divider with a fixed division coefficient is called a prescaler and is available up to a few tens of gigahertz.



Figure 3.1.80 Divide by 2 D flip-flop

### 3.1.5.3 Divider used

The divider needs to take the output of the VCO which is 10.3125 GHz, and outputs it as a signal with 156.25 MHz. To achieve this, the dividing ratio will be 66. And when dividing 66, it is divided by 11 as it is the first prime number which we can divide by, then divide by 3 and then 2.



Figure 3.1.81 The proposed Divider blocks

### 3.1.5.4 D – FF:

The D flip-flop tracks the input, making transitions with match those of the input D. The D stands for "data"; this flip-flop stores the value that is on the data line. It can be thought of as a basic memory cell.

The D-Flip flop is the basic building block of the divider circuit. There are three types of the D-FF, which are Current Mode Logic (CML), True Single Phase Clock (TSPC) and CMOS flip flops.

#### (a) CML Latch

MCML (MOS current mode logic) in general consists of three main components, the load, the Pull Down Network (PDN) (the differential inputs and the cross coupled transistors) and a constant current source. MCML is a completely differential logic, i.e. all signals and their complements are required. Depending on the logic implemented by the PDN , all the current flows through one of the two branches, providing complementary output signals. Voltage at the output of branch with no current reaches VDD, whereas for the other branch some voltage drops across the load resistor and the output voltage becomes  $VDD - I_{bias}RL$ . MCML does not provide a rail-to-rail output swing. Due to the reduced swing, it has smaller dynamic power dissipation. MCML circuits are faster than other logic families, because it uses NMOS transistors only. Due to its differential nature, it is highly immune to common mode noise. It has almost flat power curve over a wide range of frequency as opposed to other logic styles where power consumption increases directly with frequency. Therefore at very high frequencies its power consumption is comparable or lower than other logic styles. This makes it a good choice for high speed and low power integrated circuit design.



Figure 3.1.82 CML latch

**(b) TSPC :**

The positive-edge trigger TSPC DFF of Yuan and Svensson has been widely used and it is constructed by a P-C2 MOS stage, an N-precharge stage, and an N-C2 MOS. This TSPC approach can be extended to realize the ratioed DFF. When CLK=0, it is in Hold Mode. Since MN3 is off in this mode, node b is precharged to VDD through MP3. Thus, since both of MN4 and MP4 are off, the data is in node Q is held. The P-C2 MOS stage functions as an pseudo-inverter now, and the data D is transmitted to node a. When CLK=1, it is in Evaluation Mode. If node a is 1 in this instant, node b will be pulled down to 0 because MN2 and MN3 are turned on and MP3 is turned off now, and then MP4 is on. At this time the node C becomes 0 and MN6 is turned off, thus the Q becomes 1. If node a is 0, node b is held 1 because MN2 is off. Since the node C is 1 and MN6 is turned on, the Q becomes 0; i.e., the data in node b is transmitted to node Q.



Figure 3.1.83 TSPC D flip-flop

### 3.1.5.5 Logic Gates:

A logic gate is an elementary building block of a digital circuit. Most logic gates have two inputs and one output. At any given moment, every terminal is in one of the two binary conditions low (0) or high (1), represented by different voltage levels. The logic state of a terminal can, and generally does, change often, as the circuit processes data.

There are 2 types of gates used in the divider (a) Static CMOS gates & (b) Dynamic CMOS gates.

#### (a) Static CMOS gates :

A Static CMOS gate is a combination of two networks - the pull-up network (PUN) and the pull-down network (PDN). The function of the PDN is to provide a connection between the output and Vdd anytime the output of the logic gate is supposed to be 1. Similarly, the PUN connects the output to Vss anytime the output is supposed to be 0.

The PUN and PDN networks are constructed in a mutually exclusive manner such that one and only one of the networks is conducting in steady state.

The Static CMOS gates have rail-to-rail swing , no static power dissipation. The speed of the static CMOS circuit depends on the transistor sizing and the various parasitics that are involved with it. The problem with this type of implementation is that for N fan-in gate  $2N$  number of transistors are required, ie, more area required to implement logic. This has an impact on the capacitance and thus the speed of the gate.

### **(b) Dynamic CMOS gates :**

Dynamic CMOS circuits rely on the temporary storage of signal values on the capacitance of high-impedance circuit nodes. These circuits also have no static power dissipation and uses a sequence of precharge and conditional evaluation phases with the addition of a clock input.

The main advantages of the Dynamic CMOS logic are increased speed and reduced implementation area. Fewer devices are used to implement a given logic, this reduces the overall load capacitance and thus increases the speed.

The next table compares between the two types of logic gates

|               | Static CMOS                                                                                          | Dynamic CMOS                                                                       |
|---------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| Advantages    | Rail to rail output swing & No static power dissipation                                              | for N fan-in gate, N+2 transistors are required to work at high frequency properly |
| Disadvantages | for N fan-in gate, 2N transistors are required to work at low frequency only high output capacitance | Dynamic power dissipation                                                          |

#### **3.1.5.6 Achieved divider:**

In our work, we used three dividers, divide by 2, by 3 and by 11. We will illustrate the divider work as following:

##### **(I) Divide by 2**

The divide by 2 block is the first block in our divider which takes the output of the VCO, which is 10 GHz, and gives an output of 5 GHz, which will be the input of the divide by 3 block.



Figure 3.1.84 Divide by 2 block

### (II) Divide by 3

The divide by 3 block is the second block in our divider which takes the output of the divide by 2 block, which is 5 GHz, and gives an output of 1.66 GHz, which will be the input of the divide by 11 block.



Figure 3.1.85 Divide by 3 block

### (III) Divide by 11

The divide by 11 block is the last block in our divider which takes the output of the divide by 3 block, which is 1.66 GHz, and gives an output of 156.25 MHz, which is the feedback signal of the phase and frequency detector (PFD). A 4-bit counter is used to reset every 11 clock cycles, with the help of a certain logic circuit. The 4-bit counter is shown in figure 3.1.86.



Figure 3.1.86 4-bit up counter with j-k flip-flops

The logic circuit responsible for the final output of the divider is shown in figure 3.1.87. It consists of 2 AND gates & 1 OR gate. It causes a t-flipflop to toggle at the 0 & 6 clock cycles of the counter.



Figure 3.1.87 Final output circuit of the divider

## Final Output Plots:



Figure 3.1.88 Output plot from each divider sub block

### 3.1.5.7 Power consumption:

The following table contains the power consumption after each block in the Divider.

|                   | Power consumption |
|-------------------|-------------------|
| Divide by 2       | 1.38 mw           |
| Divide by 6       | 3.32 mw           |
| The whole divider | 5.7 mw            |

### 3.1.5.8 Corners Analysis:

#### (a) Typical – Typical



Figure 3.1.89 Final output across Typical-typical corner



Figure 3.1.90 Phase noise across Typical-typical corner

**(b) Fast – Fast**



Figure 3.1.91 Final output across Fast-fast corner

**(c) Slow – Slow**



Figure 3.1.92 Final output across Slow-slow corner

**(d) Slow – Fast**



Figure 3.1.93 Final output across Slow-fast corner

**(e) Fast – Slow**



Figure 3.1.94 Final output across Fast-slow corner

# Chapter (2)

## Clock and Data Recovery

### 3.2.1 Overview on the system:

When the data is received it is both asynchronous and noisy. For subsequent processing, timing information must be extracted from the data so as to allow asynchronous operations. Furthermore, the data must be retimed such that the jitter accumulated during transmission is removed. The task of clock extraction and data retiming is called “clock and data recovery” (CDR).

In order to perform synchronous operations such as retiming and demultiplexing on random data, receivers must generate a clock. As illustrated in the figure clock recovery circuits senses the data and produces a periodic clock. A D flipflop driven by the clock then retimes the data; i.e it samples the noisy data yielding an output with less jitter. Note that the zero crossings of the received data are corrupted by noise and jitter while those at the output of the flipflop are as clean as the recovered clock itself.



Figure 3.2.1 A simple CDR system

The generated clock must satisfy three important conditions:

- 1- It must have a frequency equal to the data rate.
- 2- It must bear a certain phase relationship with respect to data (sampling is optimum at the middle of the eye).
- 3- It must exhibit a small jitter as it is the principal contributor to the retimed data jitter.

### 3.2.1.1 CDR architectures:

#### I) Single loop (PLL based):

While a PLL-based CDR is an effective timing recover system, the power and area cost of having a PLL for each receive channel is prohibitive in some I/O systems. Another issue is that CDR bandwidth is often set by jitter transfer and tolerance specifications, leaving little freedom to optimize PLL bandwidth to filter various sources of clock noise, such as VCO phase noise. The nonlinear behavior of binary phase detectors can also lead to limited locking range, necessitating a parallel frequency acquisition. This motivates the use of dual-loop clock recovery, which allows several channels to share a global receiver PLL locked to a stable reference clock and also provides two degrees of freedom to independently set the CDR bandwidth for a given specification and optimize the receiver PLL bandwidth for best jitter performance.



Figure 3.2.2 Single loop CDR system

#### II) Dual loop architecture:

This CDR has a core loop which produces several clock phases for a separate phase recovery loop. The core loop can either be a frequency synthesis PLL. An independent phase recovery loop provides flexibility in the input jitter tracking without any effect on the core loop dynamics and allows for sharing of the clock generation with the transmit channels. The phase loop typically consists of a binary phase detector, a digital loop filter, and a finite state machine (FSM) which updates the phase muxes and interpolators to generate the optimal position for the receiver clocks.

In most conventional systems the clock recovery is made by a detecting the phase of the data from a slicer using a nonlinear (Bang-bang) phase detector then an accumulator is used to drive a phase interpolator. The phase interpolator produces a clock with the appropriate phase to the slicer as shown in the figure.



Figure 3.2.4 Conventional Dual loop CDR

However the CDR system implemented is different from that, first the receiver architecture is ADC based so the data is sampled and quantized into digital bits i.e no slicer is used, so direct acquisition of data cannot be implemented instead phase acquisition is implemented by processing of certain bits of the ADC output.

Since the receiver is ADC based the architecture of the CDR is mixed between analog and digital circuits. The system is based on a dual loop phase tracking architecture the first loop is the PLL loop that we have presented in the previous work, while the other loop is based on a Phase interpolator with a digital control circuit.

The ADC is 15 bits thermometer code flash ADC, to work properly the ADC comparators are replicated four times to implement an interleaved structure working at a reduced data rate of 2.5GHz. The comparators are fed by four clocks interleaved by 90 degrees of phase shift.

To get information about the phase, edge sampling must be performed and compared to the data samples. The middle comparators can represent the level of the signal (0 or 1), so comparison between middle samples can be enough to detect the phase. Edge clock must be generated interleaved 45 degrees phase shift from the corresponding data clock, the following diagram shows an analogy between data and edge on full rate and quarter rate.



Figure 3.2.5 CDR system



Figure 3.2.6 Clocking scheme of CDR loop

It is noticed that we have 8 clocks, 4 clocks for the four middle comparators of the data and the other 4 clocks are for additional four middle comparators for the edge clocks. Samples from the ADC comparators are demultiplexed to reduce the data rate to 1.25Gbps so that the digital block can process these samples correctly. After the demultiplexer the samples are processed by the digital control block, the digital block estimates from the edge and data samples the early and late samples, based on these early

and late samples majority voting on these samples is performed with a decision to increment or decrement the 7-bit accumulator, the output of the accumulator decoded into thermometer code then fed into the phase interpolators.

The phase interpolators have 4 clocks as input; I, Q, IBar and QBar. The phase interpolators produce clocks with phase depending on the input code-word from the accumulator of the digital control block. To generate the 8 clocks, we will have 8 PIs each shifted by 45 degrees from the other.

### 3.2.2 Digital Block

The increasing demand for data bandwidth in networking has driven the development of high-speed and low-cost serial link technology. A 0.65 nm CMOS technology is not fast enough to directly generate and receive a 10 Gb/s stream. Instead, parallelism is used to reduce the performance requirements of each circuit. A quarter-rate CDR circuit will operate using eight phases of a clock running at a frequency equal to one-fourth the data rate. As a result, the proposed architecture reduces the switching frequency to one-fourth, and hence the total dynamic power consumption is reduced to one-fourth, which will eventually lead to the implementation of a low-power receiver circuit and hence low-power and high-speed point-to-point serial link [1] [2] [3]. The architecture is based on performing quarter-rate phase detection as well as 1-to-4 de multiplexing.



Figure 3.2.7 Components of the digital block

The architecture of the digital block is illustrated in Figure 3.2.7. The receiver (RX) receives the data and determines the level of the input data by 15 comparators. The output from the comparators is de-multiplexed to decrease the rate of the data. The data stream is sampled at four equally spaced phases of clocks, data (center) and edge clocks as shown in Figure 3.2.8. The output data stream enters the digital block at the same clock by entering the bits of data to a parallel D flipflops. Multiple xors are used to determine if there is a transition of data stream. Bang-bang CDRs are generally the largest source of errors in a high-speed serial links [4], and are interesting because they exhibit large amounts of both deterministic and random jitter. The logic circuitry driven by the flipflops generates the early and late control pulses for the phase interpolator. Because these control pulses are generated by clocked flipflops, they are well defined width. The advantage is that they do not depend on the data pattern. On the other hand, they do not reflect the amount of the phase error, either. The pulse width is constant, even for very small phase error. The phase logic evaluated only rising signal edges, in order not to depend on duty cycle variations on the input signal.



Figure 3.2.8 operation of phase detector

Edge is detected by sensing a transition in the data. The next step is to determine if the incoming stream of data bits that changed from zero to one are early or late, data at sampling time B equals the data at the preceding sampling time A  $\Rightarrow$  data transition is late  $\Rightarrow$  frequency up, data at sampling time B equals the data at the following sampling time A  $\Rightarrow$  data transition is early  $\Rightarrow$  frequency down, data at sampling time A equals the data at the preceding sampling time A  $\Rightarrow$  no data edge, no control signal output. The decision to increase or decrease an accumulator is done by majority voting of early or late signals to increase or decrease the accumulator that will be used to control the phase interpolator (PI) and explained in Table I. This architecture requires a voltage-controlled oscillator (VCO) or a phase interpolator (PI), both analog circuits, to adjust the phase of the sampling clocks [5]. The output to the phase interpolator is a code word consists of 7 bits, the 5 LSB is converted to thermometer of 31-bit long and the 2 MSB is converted to grey.

Table I: Operation of the majority voting

| Shift Right | Shift Left | Action                                                                                           |
|-------------|------------|--------------------------------------------------------------------------------------------------|
| High        | Low        | <b>Late state:</b> shift the data stream to the right by decreasing the delay of the delay line. |
| Low         | High       | <b>Early state:</b> shift the data stream to the left by increasing the delay of the delay line. |
| High        | High       | <b>High impedance state:</b> no action                                                           |

## Simulation results

The overall architecture of the digital block for the proposed clock and data recovery circuit is completely realized in digital circuits. Consequently, the building blocks mentioned above are realized in Verilog-HDL code and synthesized using Xilinx program. A result of the simulation is shown in Figure 3.2.9.



Figure 3.2.9 result of the simulation

### 3.2.3 Phase interpolator:

#### 3.2.3.1 Introduction:

The phase interpolator is a dual input delay buffer which receives two clocks,  $\varphi'$  and  $\psi'$ , and generates the main clock  $\Theta$ . Ideally, the phase of clock  $\Theta$  is the weighted sum of the phases of clocks  $\varphi$  and  $\psi$ , which are delayed by a single buffer delay from the interpolator inputs. Interpolators with static phase mixing weights can be constructed by shunting the output of two half-sized CMOS or current mode delay buffers.



Figure 3.2.10 a) Model of PI b) Operation of PI

The interpolator used in this CDR system incorporates two D/A converters within the dual input buffer, converting the digital weight code generated by the FSM to two complementary buffer currents which affect the phase of the output clock  $\Theta$ . A simplified model of the interpolator is depicted in Figure 3-8-1-(a). In this model the switching action of the two buffers is modeled by applying the corresponding current to the output at a time controlled by the timing of the two input edges  $\phi'$  and  $\psi'$ . The delay of the interpolator is intrinsically controlled by its output RC time constant. However, as illustrated in Figure 3-8-2-(b), changing the currents of the two branches affects the overall delay by controlling the swing of the branch that switches first. The interpolator output voltage as a function of time is given by:

$$V_o(t) = V_{DD} + R \cdot I \left[ (1 - w) \cdot u(t) \cdot \left( e^{-\frac{t}{RC}} - 1 \right) + w \cdot u(t - \Delta t) \cdot \left( e^{\frac{t-\Delta t}{RC}} - 1 \right) \right]$$



Figure 3.2.11 Simplified model of PI

Where  $R$  is the total interpolator resistive load,  $C$  the output capacitance,  $w$  is the interpolation weight, and  $\Delta t$  the time delay between the two input phases. The equation shows that the interpolator delay depends not only on the interpolation weight but also on the time delay between the interpolator inputs. Using that equation the interpolator transfer function (w-to-delay) can be derived.

Figure 3.2.12 illustrates the transfer function, for varying values of  $\Delta t$ . Both  $\Delta t$  and the interpolator delay are normalized to the output  $RC$  time constant. Moreover, the interpolator delay in this figure is referenced to the delay of the circuit with  $w=0$ .



Figure 3.2.12 Interpolator transfer function with varying  $\Delta t$

Figure 3.2.12 shows that the interpolator transfer function becomes increasingly nonlinear, as the delay between the two step inputs becomes larger than the  $RC$  time constant of the circuit. Although in a real implementation this nonlinearity would be mitigated due to the finite slew-rate of the input phases, it is a strong argument for retaining approximately the same  $RC$  time-constant throughout the peripheral loop. This delay equalization not only increases the interpolator linearity, but it also ensures that the interpolator output does not settle to a value equal to half the final swing thus increasing the jitter sensitivity.

### 3.2.3.2 Design implementation:

The input clock from the phase interpolator comes from the VCO of the PLL. Since the CDR system works at quarter rate, a frequency divider must be used to generate quarter

rate clock. Then a multiplexer must be used to select the appropriate quadrant for each phase. The core PI input is a 31 bit thermometer code coming from the digital control block in order to ensure the monotonicity of the phase interpolator.



Figure 3.2.13 PI architecture

The divider used is a current logic static frequency divider using the schematic in Figure 3.2.13. The divider in the figure is a divide by 2, so by cascading this divider we get a divide by 4 frequency divider. The divider has the advantage if not using no passive devices which reduce the area of the circuit. Another important advantage is that it can produce the clock and quadrature clock differentially which gives us 4 phases I, IBar, Q, QBar by simply swapping the output terminals of I and Q clocks.



Figure 3.2.14 Schematic of the divider

The output waveform of the I and Q clocks has large swing (Figure) which is not desirable in the PI so in the MUX design this swing will be reduced in order not to drive the phase interpolator into nonlinear regions.



Figure 3.2.15 Output waveform of the divider

The sizes of the transistors can be shown in the following table

|                   |          |
|-------------------|----------|
| M1 – M2           | 800n/60n |
| M3 – M4           | 1u/60n   |
| M5 – M6           | 1u/60n   |
| M7 – M8           | 600n/60n |
| Power consumption | 0.93mW   |

The next block is a multiplexer, it is actually two 2-1 multiplexers, one for the I clock to choose whether it is I or IBar and the other multiplexer is for the Q clock to choose whether it is Q or QBar. The two multiplexers together select the quadrant in which the interpolator is working. The selectors of the multiplexers must be gray coded in order to get the right sequence of quadrants in the interpolator.

| Quadrant        | S1 S0 |
|-----------------|-------|
| 1 <sup>st</sup> | 00    |
| 2 <sup>nd</sup> | 01    |
| 3 <sup>rd</sup> | 11    |
| 4 <sup>th</sup> | 10    |

In order to work at such high speed the multiplexers were CML multiplexers as shown in Figure 3.2.16. The output of the multiplexer has a swing of  $-V_{dd}/2$  to  $V_{dd}/2$  as shown in Figure 3.2.17.



Figure 3.2.16 2-1 CML Multiplexer



Figure 3.2.17 Output waveform of MUX

The aspect ratios of the transistors and the power consumption are given in the following table.

|                   |            |
|-------------------|------------|
| Diff pair         | 2.4μm/60nm |
| Select            | 1.2μm/60nm |
| Current mirror    | 29μm/240nm |
| Resistance        | 1.25kΩ     |
| Power consumption | 0.75mW     |

### 3.2.3.2 Core Phase interpolator:

The phase interpolator used is a dual differential buffer with a resistive load one buffer has the in phase clock as input and the other has the quadrature phase, the buffer drivers are controlled by a 31 bit thermometer coded code word to control the interpolation weight, as shown in Figure 3-8-9. Thermometer code was used to ensure monotonicity of the design.



Figure 3.2.18 Core Phase interpolator

For interpolation to occur sharp edges of the input clock cannot be used therefore 180fF capacitors were added to input gates of the interpolator so that interpolation can occur.

After several runs to determine the transfer functions, the performance over different quadrants naturally does not change. Determining the transfer function is a time consuming process which requires an advanced work station to run all the possible corners, so the measured results are of the typical performance and two extreme corners as shown in the following figures.

Previous results showed that the worst performance (most nonlinear) is in the ff corner as the current in the current sources increased leading to more nonlinearity, so the design was adjusted ff corner by increasing the capacitance to 180fF and consequently other corners worked fine.



Figures 3.2.19 tt-27deg-1V

It is noticed that the performance of the phase interpolator is almost linear, the average step size is 2.72 degrees.



Figures 3.2.20 ff-0deg-1.1V



Figures 3.2.21 ss-125deg-0.9V

The following table illustrates the sizes and power consumption of the core PI.

| Device/property    | Value           |
|--------------------|-----------------|
| Diff pair          | 7 $\mu$ m/60nm  |
| Switch transistors | 200nm/60nm      |
| Current sources    | 8 $\mu$ m/360nm |
| Power consumption  | 0.179mW         |

### 3.2.3.3 CML to CMOS converter:

The required clocks for the ADC comparators are supposed to be CMOS clocks are differential in addition to the small swing to ensure the linearity of the phase interpolator. A CML to CMOS stage must be introduced to feed the comparators with the appropriate clocks.

The circuit shown in Figure 2.3.13 shows the circuit used to obtain a CMOS clock. A current steering adaptor is used to obtain a CMOS clock, however it does not obtain full swing, so two inverter buffers are used to obtain the full swing clock.



Figure 3.2.22 The circuit used for a CMOS clock

The output waveform of the converter is shown in Figure 2.3.14.



Figure 3.2.24 Output waveform of the converter

Integration between the converter and the core PI was made to ensure that linearity is preserved in the circuit and the transfer function has not changed. The output of this - integration shows no significant change in the transfer function as shown in Figure 2.3.15.



Figure 3.2.25 Transfer function after adding the converter

The following table shows the device sizes and power consumption

| Device/property   | Value                          |
|-------------------|--------------------------------|
| Diff pair         | $23\mu\text{m}/60\text{nm}$    |
| Load transistors  | $23\mu\text{m}/60\text{nm}$    |
| Current sources   | $130 \mu\text{m}/360\text{nm}$ |
| Inverter NMOS     | $6\mu\text{m}/60\text{nm}$     |
| Inverter PMOS     | $6\mu\text{m}/60\text{nm}$     |
| Power consumption | 1.6mW                          |

## **References:**

- [1] AN-1001 An Analysis and Performance Evaluation of a Passive Filter Design Technique for Charge Pump PLL's
- [2] B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, 2001
- [3] B.Razavi, Design of Integrated circuits for optical Communications, McGraw-Hill, 2003.
- [4] J.Rogers, C.Plett and F.Dai, "Integrated Circuit Design for High-Speed Frequency Synthesis" ARTECH House 2006
- [5] Woogeun Rhee, "DESIGN OF HIGH-PERFORMANCE CMOS CHARGE PUMPS IN PHASE-LOCKED LOOPS" Conexant Systems, Inc. Newport Beach, California 92660, USA.
- [6] J. Eric Bracken, "Simulating Charge Injection in MOS Analog Circuits"
- [7] Joseph M. Ingino and Vincent R. von Kaenel, "A 4-GHz Clock System for a High-Performance System-on-a-Chip Design", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001
- [8] Ian A. Young, Jeffrey K. Greason, and Keng L. Wong, "A PLL Clock Generator with 5 to 110 MHz of Lock Range for Microprocessors", IEEE JOURNAL OF SOLID-STATE CIRCUITS. VOL. 27, NO. 11 , NOVEMBER 1992
- [9] R.L. Bunch, "A Fully Monolithic 2.5GHz LC Voltage Controlled Oscillator in 0.35 $\mu$ m CMOS Technology", Blacksburg , Virginia, 2001
- [10] Cadence, "VCO Design using SpectreRF Application Note", 2003
- [11] Richard C.Walker " Designing Bang-Bang PLLs for clock and Data recovery in Serial Data Transmission"
- [12] Xiang Gao, Klumperink, E.A.M. ; Geraedts, P.F.J. ; Nauta, B. "Jitter Analysis and a Benchmarking Figure-of-Merit for Phase-Locked Loops" Circuits and Systems II: Express Briefs, IEEE Transactions on (Volume:56 , Issue: 2 )

- [13] D.C.Lee "Analysis of jitter in phase-locked loops" Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on (Volume:49 , Issue: 11 ).
- [14] Sonntag, J.L.; Stonick, J., "A Digital Clock and Data Recovery Architecture for Multi-Gigabit/s Binary Links," Solid-State Circuits, IEEE Journal of , vol.41, no.8, pp.1867,1875, Aug. 2006 doi: 10.1109/JSSC.2006.875292
- [15] Greshishchev, Y.M.; Schvan, P.; Showell, J.L.; Mu-Liang Xu; Ojha, J.J.; Rogers, J.E., "A fully integrated SiGe receiver IC for 10-Gb/s data rate," Solid-State Circuits, IEEE Journal of , vol.35, no.12, pp.1949,1957, Dec. 2000
- [16] Savoj, J.; Razavi, B., "A 10 Gb/s CMOS clock and data recovery circuit with frequency detection," Solid-State Circuits Conference, 2001. Digest of Technical Papers. ISSCC. 2001 IEEE International , vol., no., pp.78,79, 7-7 Feb. 2001
- [17] Rogers, J.E.; Long, J.R., "A 10Gb/s CDR/DEMUX with LC delay line VCO in 0.18/spl mu/m CMOS," Solid-State Circuits Conference, 2002. Digest of Technical Papers. ISSCC. 2002 IEEE International , vol.2, no., pp.204,473, 7-7 Feb. 2002
- [18] Nogawa, M.; Nishimura, K.; Kimura, S.; Yoshida, T.; Kawamura, T.; Togashi, M.; Kumozaki, K.; Ohtomo, Y., "A 10 Gb/s burst-mode CDR IC in 0.13  $\mu$ m CMOS," Solid-State Circuits Conference, 2005. Digest of Technical Papers. ISSCC. 2005 IEEE International , vol., no., pp.228,595 Vol. 1, 10-10 Feb. 2005
- [19] Ramezani, M.; Andre, C.; Salama, C.A.T., "A 10Gb/s CDR with a half-rate bang-bang phase detector," Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , vol.2, no., pp.II-181,II-184 vol.2, 25-28 May 2003
- [20] Stefanos Sidiropoulos "High-performance inter-chip signaling" PhD thesis, Stanford University, 1998.
- [21] Tyshchenko, O.; Sheikholeslami, A.; Tamura, Hirotaka; Kibune, M.; Yamaguchi, H.; Ogawa, J., "A 5-Gb/s ADC-Based Feed-Forward CDR in 65 nm CMOS," Solid-State Circuits, IEEE Journal of , vol.45, no.6, pp.1091,1098, June 2010

- [22] Harwood, M.; Warke, N.; Simpson, R.; Leslie, T.; Amerasekera, A.; Batty, S.; Colman, Derek; Carr, E.; Gopinathan, V.; Hubbins, S.; Hunt, P.; Joy, A.; Khandelwal, P.; Killips, B.; Krause, T.; Lytollis, S.; Pickering, A.; Saxton, M.; Sebastio, D.; Swanson, G.; Szczepanek, A.; Ward, T.; Williams, J.; Williams, R.; Willwerth, T., "A 12.5Gb/s SerDes in 65nm CMOS Using a Baud-Rate ADC with Digital Receiver Equalization and Clock Recovery," Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International , vol., no., pp.436,591, 11-15 Feb. 2007
- [23] Jun Cao; Bo Zhang; Singh, U.; Delong Cui; Vasani, A.; Garg, A.; Wei Zhang; Kocaman, N.; Deyi Pi; Raghavan, B.; Hui Pan; Fujimori, I.; Momtaz, A., "21.7 A 500mW digitally calibrated AFE in 65nm CMOS for 10Gb/s Serial links over backplane and multimode fiber," Solid-State Circuits Conference - Digest of Technical Papers, 2009. ISSCC 2009. IEEE International , vol., no., pp.370,371,371a, 8-12 Feb. 2009
- [24] Abiri, B.; Sheikholeslami, A.; Tamura, Hirotaka; Kibune, M., "An Adaptation Engine for a 2x Blind ADC-Based CDR in 65 nm CMOS," Solid-State Circuits, IEEE Journal of , vol.46, no.12, pp.3140,3149, Dec. 2011
- [25] Hanumolu, P.K.; Gu-Yeon Wei; Un-Ku Moon, "A Wide-Tracking Range Clock and Data Recovery Circuit," Solid-State Circuits, IEEE Journal of , vol.43, no.2, pp.425,439, Feb. 2008
- [26] Wenjing Yin; Inti, R.; Elshazly, A.; Talegaonkar, M.; Young, B.; Hanumolu, P.K., "A TDC-Less 7 mW 2.5 Gb/s Digital CDR With Linear Loop Dynamics and Offset-Free Data Recovery," Solid-State Circuits, IEEE Journal of , vol.46, no.12, pp.3163,3173, Dec. 2011
- [27] Changhua Cao; O, K.K., "A power efficient 26-GHz 32:1 static frequency divider in 130-nm bulk CMOS," Microwave and Wireless Components Letters, IEEE , vol.15, no.11, pp.721,723, Nov. 2005
- [28] Wang, Huaide; Lee, Chao-Cheng; Lee, An-Ming; Jri Lee, "A 21-Gb/s 87-mW transceiver with FFE/DFE/linear equalizer in 65-nm CMOS technology," VLSI Circuits, 2009 Symposium on , vol., no., pp.50,51, 16-18 June 2009
- [29] Razavi, B.,Design of Analog CMOS Integrated Circuits, McGraw-Hill, 2001.

- [30] Goldman, S., Phase-Locked Loop Engineering Handbook for Integrated Circuits.
- [31] Razavi, B., Design of Monolithic Phase-Locked Loops and Clock Recovery Circuits, New York: Wiley-IEEE Press, 1996.
- [32] Rogers, J., and Plett, C., Integrated Circuit Design for High-Speed Frequency Synthesis.
- [33] Best, R. E., Phase-Locked Loops (Design, Simulation, and Applications) , New York: McGraw-Hill, Forth ed.
- [34] Banerjee, D., PLL Performance, Simulation, and Design



# **65 nm**

## **High Speed Serial Link Transceiver**

**Serial link transceivers are widely used nowadays in different platforms from handheld devices to data centers. The power consumption of the transceivers is a key part of the platform power; especially as data rates keep increasing.**

**The main goal of the project is to design and implement a 10 Gb/s I/O transceiver. ADC-based equalizer in the receiver will be implemented to achieve better performance in terms of jitter and bit-error-rate. Techniques like scalable supply and bias currents, current re-use, small TX output swing and optimized RX sensitivity will be evaluated and used when necessary to achieve the targeted power efficiency.**

**In the TX side, a serializer, pre-driver and driver will be implemented. Power management is to be used to have scalable supply and bias currents and to control the output swing. In the RX side, a VGA and CTLE will be implemented to relax the constraints on the ADC to follow. Digital equalization and deserialization will be implemented in the DSP. Synchronization blocks will be implemented in the TX and RX side to ensure proper timing for best performance. Reduced TX swing reduces the clock loading and thus decreases the power consumption of the clock driver.**

### **Workgroup**

**Ahmed Elsaied, Ahmed Elsayed, Hisham Mubarak,  
Nasr Mahana, Mohamed Alaa, Mohamed Isa,  
Mohamed Megahed, Mohamed Rayay,  
Mustafa Naeem, Mostafa Mahmoud**