

# A 16-Gb/s 14.7-mW Tri-Band Cognitive Serial Link Transmitter With Forwarded Clock to Enable PAM-16/256-QAM and Channel Response Detection

Yuan Du, *Student Member, IEEE*, Wei-Han Cho, *Student Member, IEEE*, Po-Tsang Huang, *Member, IEEE*, Yilei Li, *Student Member, IEEE*, Chien-Heng Wong, Jieqiong Du, *Student Member, IEEE*, Yanghyo Kim, *Student Member, IEEE*, Boyu Hu, Li Du, *Student Member, IEEE*, Chunchen Liu, *Member, IEEE*, Sheau Jiung Lee, and Mau-Chung Frank Chang, *Fellow, IEEE*

**Abstract**—A cognitive tri-band transmitter (TX) with a forwarded clock using multiband signaling and high-order digital signal modulations is presented for serial link applications. The TX features learning an arbitrary channel response by sending a sweep of continuous wave, detecting power level at the receiver side, and then adapting modulation scheme, data bandwidth, and carrier frequencies accordingly based on detected channel information. The supported modulation scheme ranges from nonreturn to zero/Quadrature phase shift keying (QPSK) to Pulse-amplitude modulation (PAM) 16/256-Quadrature amplitude modulation(QAM). The proposed highly reconfigurable TX is capable of dealing with low-cost serial channels, such as low-cost connectors, cables, or multidrop buses with deep and narrow notches in the frequency domain (e.g., a 40-dB loss at notches). The adaptive multiband scheme mitigates equalization requirements and enhances the energy efficiency by avoiding frequency notches and utilizing the maximum available signal-to-noise ratio and channel bandwidth. The implemented TX prototype consumes a 14.7-mW power and occupies 0.016 mm<sup>2</sup> in a 28-nm CMOS. It achieves a maximum data rate of 16 Gb/s with forwarded clock through one differential pair and the most energy efficient figure of merit of 20.4  $\mu$ W/Gb/s/dB, which is calculated based on power consumption of transmitting per gigabits per second data and simultaneously overcoming per decibel worst case channel loss within the Nyquist frequency.

**Index Terms**—Cognitive, continuous-time linear equalization (CTLE), decision feedback equalization (DFE), digital

Manuscript received August 10, 2016; revised October 6, 2016; accepted November 1, 2016. Date of publication December 2, 2016; date of current version March 23, 2017. This paper was approved by Guest Editor Brian Ginsburg. This work was supported in part by the Broadcom Foundation, in part by the Air Force Research Laboratory, and in part by the Defense Advanced Research Projects Agency (DARPA) under Grant FA8650-15-1-7519.

Y. Du, Y. Li, L. Du, and C. Liu are with the High Speed Electronics Laboratory, University of California, Los Angeles, CA 90095 USA, and also with Kneron, Inc., San Diego, CA 92121 USA.

W.-H. Cho, C.-H. Wong, J. Du, Y. Kim, B. Hu, and S. J. Lee are with the High Speed Electronics Laboratory, University of California, Los Angeles, CA 90095 USA.

P.-T. Huang and M.-C. F. Chang are with the High Speed Electronics Laboratory, University of California, Los Angeles, CA 90095 USA, and also with National Chiao Tung University, Hsinchu 300, Taiwan.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2016.2628049

modulation, energy efficiency, feedforward equalization (FFE), forwarded clock, Inter-Symbol Interference (ISI), memory interface, uW/Gb/s/dB, multiband signaling, multidrop bus (MDB), multilevel signaling, nonreturn to zero (NRZ), pulse-amplitude modulation (PAM), quadrature amplitude modulation (QAM), serial link, source synchronous, transmitter (TX), wireline.

## I. INTRODUCTION

THE data rate of peripheral serial I/O for PC and mobile computing platforms continue to scale to meet high-bandwidth applications including high-resolution displays, camera sensors, and large-capacity external storage [1]. With ever-increasing data rate, signal and power integrity become more challenging issues because of various channel loss mechanisms, as well as discontinuities caused by vias, solder balls, packages, routing wire impedance mismatches, and connector or cable transitions, which set the upper boundary of bandwidth capacity. Such examples of nonidealities are shown with the multidrop bus (MDB) channel for memory interface and low-cost peripheral serial I/Os with a connector and a cable in Fig. 1(a) and (b), respectively. When considering a cable-only case, the dielectric and conduction loss would exhibit a simple low-pass characteristic, depicted by the dashed curve in Fig. 1(b). However, the packages, solder balls, bonding wires, vias, traces, and connectors make the complete channel suffer from a higher loss at certain frequencies, as depicted by the solid curve in Fig. 1(b). The phenomenon is more pronounced in low-cost packaging, Printed Circuit Board (PCB), cable, and connector technologies. To make the matter even worse, the frequency response varies over different packages and PCB designs.

One obvious and straightforward solution to reduce such effects is to invest more resource in via, packaging, connector, and cable technologies [2]. Furthermore, depending on data rate requirements relative to the available channel bandwidth and severity of potential noise sources, a comprehensive combination of equalization schemes, such as feedforward equalization (FFE), continuous-time linear equalization (CTLE) and decision feedback equalization (DFE), is employed at the transmitter (TX) side or receiver (RX) side [3]–[8]. While



Fig. 1. Two different channel conditions. (a) MDB channel for memory interface application. (b) Low-cost connector and cable channel for peripheral serial I/Os.



Fig. 2. Conventional comprehensive combination of equalization. (a) TRX architecture. (b) Insertion loss, single-bit response, and received eye diagram on MDB channel. (c) Insertion loss, single-bit response, and received eye diagram on low-cost cable channel.

being elegant, backed by rigorous mathematical proof and digital signal processing concepts, the aforementioned approach inevitably increases the overall system complexity and total power consumption. Fig. 2(a) illustrates the common serial

link TX and RX architecture with a comprehensive combination of all the above-mentioned equalization techniques. There are FFE at the TX side and CTLE and DFE at the RX side.



Fig. 3. System architecture of proposed cognitive tri-band TX.

The system-level study reveals that if the conventional comprehensive combination of equalization schemes are used, much energy will be wasted in the notch frequencies, which is explained by the worst case notch compensation principle. Single-bit pulse responses of the two different channels in Fig. 2(b) and (c) indicate that both of them end up with very long tails [18 Unit Interval (UIs) in Fig. 2(b) and 24 UIs in Fig. 2(c)] due to high-frequency loss and strong reflection at specific frequency notches. With the two given channels (MDB and low-cost cable/connector), even with 3-tap TX FFE, 1-tap CTLE, and 18/24-tap DFE, the horizontal and vertical eye openings are still very limited as shown in Fig. 2(b) and (c). In this paper, less than a 100-mV vertical opening means a very limited signal-to-noise ratio (SNR), and less than a 20-ps horizontal opening leads to a requirement of the high-performance clock and data recovery (CDR) system to achieve a reasonable bit error rate (BER).

Meanwhile, recent research suggests that the multiband signaling could potentially be promising solutions for low-cost low-power serial interface systems [9]–[15]. By allocating modulating carrier frequencies and reshaping the transmitted power spectrum, the energy is not wasted in the worst case notch frequencies. This makes the multiband signaling particularly appealing for the channel conditions with deep and narrow frequency notches. In [9] and [10], the noncoherent pulse-amplitude modulation (PAM) or amplitude-shift keying schemes were utilized in tri-band and dual-band communications, which takes only 1-D orthogonality in the frequency domain. Later, more advanced coherent quadrature phase shift keying (QPSK) or 16-quadrature amplitude modulation (QAM) were presented in [11]–[14] to improve the spectral efficiency. However, all the previous works assumed specific channel conditions. Not only were the carrier

frequencies fixed but there was also no mechanism to obtain knowledge regarding the channel frequency response. In [16], a software programmable multitone TX was implemented. Here, however, an 8-b high-speed digital-to-analog (DAC) was necessary, resulting in power-hungry digital baseband circuits. In addition, the TX was not to adapt to different channel conditions because there is no feedback mechanism to learn the channel response.

In this paper, we propose a cognitive tri-band serial link TX with a frequency response learning algorithm and the source synchronization feature without using a physical clock forwarding channel. We introduce the system-level architecture, design considerations, and cognitive algorithm in Sections II and III. Section IV illustrates the detailed analysis and design of TX building blocks. The implementation and characterization results and conclusions are presented in Sections V and VI, respectively.

## II. SYSTEM ARCHITECTURE OVERVIEW

In this section, the system architecture of the proposed cognitive tri-band TX is introduced. Then, the concept of multiband signaling is explained and compared with traditional baseband nonreturn-to-zero (NRZ) signaling.

### A. System Architecture

The block diagram of the proposed cognitive tri-band TX is shown in Fig. 3. A modulation mapping block links pseudorandom binary sequence (PRBS) binary code to its corresponding DAC input. For different modulation schemes, the 4-b DACs will be supplied with different data patterns. It will also map different data patterns in phase calibration mode or channel learning mode. Placed after DACs, the analog and RF frontend



Fig. 4. Concept of multiband signaling with PAM-8 and 64-QAM modulators.

includes two in-phase and quadrature RF band paths and one baseband path for clock forwarding. At the last stage, all the signals from different bands are summed together and sent to the channel. A cognitive controller is designed to determine the modulation scheme and carrier frequency allocation based on the detected channel response. The cognitive controller also controls a carrier generation oscillator to choose carrier frequencies or sweep the carrier frequencies among the whole interested bands in channel learning mode. At the RX side, a power detector and a low-speed analog-to-digital converter (ADC) detect the channel response noncoherently and feed the channel information back to the cognitive controller at the TX side. After detection, the cognitive controller utilizes received channel information to determine carrier frequency, calculate the link budget, and further choose the optimum data bandwidth and modulation scheme.

#### B. Concept of Multiband Signaling

The fundamental consideration of multiband signaling is exactly the same as the cable TV system or wireless orthogonal frequency-division multiplexing (OFDM) system. However, both cable TV and wireless OFDM system are relatively narrow band systems, while the serial interface is broadband. Channel conditions of the serial interface are also very different.

In Fig. 4, PAM-8 and 64-QAM are shown as an example. There are 15 parallel data streams running at 1 Gb/s as a data source. Three of them are modulated by the PAM-8 modulator, the time-domain waveform of which are still in baseband but with multilevel features. Six of them are passed to the 64-QAM modulator, the time-domain waveform of which is modulated by RF carrier frequency  $f_1$ . Similarly, another six of them are modulated by another RF carrier frequency  $f_2$ . Then, all of these waveforms are summed together. There are one baseband, one RF band at  $f_1$ , and another RF band at  $f_2$  in the frequency domain, respectively.

#### C. Comparison of Multiband Signaling and Conventional Baseband NRZ Signaling

In order to understand the multiband signaling in a more intuitive way, we compare the simulated multiband results with those of conventional baseband-only NRZ signaling by



Fig. 5. Comparison of multiband signaling and baseband NRZ signaling on MDB channel.

assuming the same data rate requirement of 15 Gb/s with the MDB channel. As shown in Fig. 5, the energy of the conventional baseband NRZ signal is distributed relatively uniformly over the frequency band. When a uniformly distributed signal passes through MDB channel with multiple frequency notches, signal distortion happens. Severe reflections occur at different notch frequencies, leading to strong ISI and complete closure of the data eye. Complicated equalization with huge power consumption is necessary to reopen the data eye. On the other hand, in the multiband signaling case, the energy distribution is reshaped on purpose based on the channel profile. After the demodulator, the data eyes are clearly opened under the same data rate and channel condition assumption.

Another factor worth noting is that the time scales for multiband and baseband-only cases are very different. In multiband signaling, the total bit stream is divided into multiple subbands, each of which would operate at much lower speed



Fig. 6. Comparison of multiband and baseband NRZ signaling on “linear” loss channel.

compared with the total bit rate. As a result, it relaxes the CDR system design complicity and power consumption.

#### D. Self-Equalization Effect

The multiband signaling offers additional benefit in self-equalization effect. The self-equalization effect related theory was first detailed in [14]. Taking the basic amplitude modulation (AM) signal as an example, the signal after upconversion is double sidebanded; two sidebands actually contain duplicated information. The channel loss from the upper sideband is typically higher than that of the lower sideband. After downconversion, the lower sideband can compensate that of the upper sideband. In case the loss is symmetrical to the carrier frequency, the reconstructed signal will be evenly attenuated over the broadband frequencies.

A system simulation is conducted to verify this self-equalization effect and compared with that of conventional baseband NRZ signaling, as shown in Fig. 6. The MDB channel is here replaced by a frequently used “linear” loss channel. This channel loss profile is due to Electrostatic Discharge, pad loading, skin effect of the metal traces and dielectric loss of substrate materials. Again, more complicated and power-hungry equalization is necessary to reopen the data eye opening. On the contrary, in the multiband signaling case, the data eyes are still clearly open after the demodulation.

#### E. Source-Synchronization/Forwarded-Clock Architecture

A traditional source-synchronized or forwarded clock system is shown in Fig. 7, which reduces the power and complexity of clock generation and data recovery circuits, at the



Fig. 7. Traditional source-synchronized or forwarded clock system architecture.



Fig. 8. Design specification: IBI.

cost of a dedicated physical channel with clock I/O pins for clock forwarding. In contrast, the multiband signaling, as shown in Fig. 3, benefits from source-synchronized or forwarded clock communication without paying the cost of the extra clock I/O pins and channel, since the baseband path in multiband architecture can be configured for the clock forwarding purpose, thus eliminating the need for dedicated extra IO pins and channel.

In summary, multiband signaling can enable simultaneous and orthogonal communication channels in the frequency domain. It offers options to avoid channel frequency notches by carefully allocating carrier frequencies. Multiband signaling also works well with forwarded clock schemes without even increasing the number of channels and I/O pins.

### III. DESIGN SPECIFICATIONS AND COGNITIVE ALGORITHM

#### A. Design Specifications

Interband interference (IBI) is a very important specification for a multiband system. For example, in Fig. 8, the 3-GHz band is the aggressor and the 6-GHz band is the victim. The victim band locates at the second-order harmonic of the aggressor, which is suppressed by differential signaling. In-band IBI is created by the sidelobe of the aggressor, and can only be reduced by pulse shaping or filtering after the DAC at the TX side. An 18-dB in-band IBI will be present if there is no pulse shaping or filtering function block after the DAC. In-band IBI could be improved to around 40 dB by simple *RC* low-pass filtering. Apart from in-band IBI, all other interferences are considered as out-of-band IBI,



Fig. 9. Link budget calculation for difference modulation schemes.

TABLE I  
LINK BUDGET CALCULATION SUMMARY

|                                         | QPSK  | 16-QAM  | 64-QAM  | 256-QAM |
|-----------------------------------------|-------|---------|---------|---------|
| Required SNR (BER < 10 <sup>-12</sup> ) | 17dB  | 24dB    | 30dB    | 36dB    |
| Bits per Symbol                         | 2     | 4       | 6       | 8       |
| Channel BW                              | 1GHz  | 1GHz    | 1GHz    | 1GHz    |
| Data Rate                               | 2Gb/s | 4Gb/s   | 6Gb/s   | 8Gb/s   |
| Required EVM (Norm. to Avg. Power)      | -3dB  | -13.3dB | -21.3dB | -29.4dB |

which can be rejected by the RX side of the low-pass filter (LPF).

A link budget is calculated by the cognitive controller based on the BER requirement and different modulation schemes. As shown in Fig. 9, starting from the TX output power, the signal passes through the frequency-dependent loss channel. When arriving at the RX input, the received signal power needs to be higher than the RX sensitivity, of which is defined in the following in decibel meters [18]:

$$P_{RX\_sen} = -174 \text{ dBm/Hz} + NF + 10\log B + SNR_{required} \quad (1)$$

where  $-174 \text{ dBm/Hz}$  is thermal noise floor at room temperature,  $NF$  is the RX noise figure in decibels,  $B$  is the signal bandwidth in hertz, and  $SNR_{required}$  is the required SNR in decibels for different modulation schemes.

With detected channel loss information, the cognitive controller could set TX output power level based on the following by tuning the unit current source in the DAC based on the link budget calculation result:

$$P_{TX} = P_{RX\_sen} + L_{CH} + Margin \quad (2)$$

$$P_{TX} = -174 \text{ dBm/Hz} + NF + 10 \log B + SNR_{required} + L_{CH} + Margin \quad (3)$$

where  $P_{TX}$  is TX output power in decibel meters,  $L_{CH}$  is channel loss in decibels at the interested frequency, and Margin is the link budget margin in decibels. Table I summarizes the required SNRs, bits per symbol, data rates, and required error vector magnitudes (EVMs) (normalized to signal average power) for QPSK, 16-QAM, 64-QAM, and 256-QAM based on the  $10^{-12}$  BER.



Fig. 10. Noncoherent channel learning scheme.

### B. Noncoherent Channel Learning

As shown in Fig. 10, the noncoherent channel learning is very straightforward. The TX side sweeps the frequencies over interested bands using an external oscillator, which is controlled by the cognitive controller. At the RX side, only one power sensor and one low-speed ADC is needed to extract useful channel information, such as notch frequencies, bandwidth, and frequency-dependent channel loss. In practice, another pair of power sensor and low-speed ADC is necessary at the TX side in order to calibrate frequency dependency of the TX output power level. This noncoherent detection extracts magnitude information only and provides no phase information. Channel learning process runs only once at the beginning of data transfer. As long as the channel conditions remain stable during the operation, there is no need for additional channel learning, and therefore, the power overhead during data transfer operation can be ignored.

### C. Cognitive Algorithm Design

The cognitive algorithm is illustrated in Fig. 11. The first step of channel learning is the noncoherent detection mentioned in Section III-B. The channel information needs to be sent back through a low-cost low-speed single-ended channel. Several important parameters are extracted by the cognitive controller, including frequency notch location, available bandwidth in each band, and channel loss profile over the whole interested band. With the extracted channel information, the second step is to smartly choose carrier frequency to avoid the high-loss notch frequencies and modulation scheme based on the system data rate and BER requirements. After that, in the third step, the cognitive controller calculates the link budget mentioned in Section III-A and sets the TX output power. The cognitive controller needs to check the lookup table for the required SNR information for the determined modulation scheme. In the last step, the phase calibration needs to be done for each carrier frequency before initiating the data transmission.

### D. Delay Mismatch Analysis

There are two different types of delay mismatch among channels: 1) delay mismatch between physical channels and 2) delay mismatch between multibands.

The physical channel delay mismatch (between different differential channels on the cable or on the PCB) is caused by channel design and fabrication variations, as shown in Fig. 12(a). The proposed multiband signaling is unique and has advantages over those conventional ones to deal with

## Step 1: Non-Coherent Channel Learning

### Channel Quality Acquisition

- Frequency sweeping by CW from 0.1GHz to 10GHz @ TX
- Power detection @ RX



### Channel Feature Extraction

- Frequency notches
- 3dB bandwidth of Baseband (BW<sub>0</sub>)
- Carrier frequencies (f<sub>c*i*</sub>)
- Bandwidth of each band (BW<sub>*i*</sub>)
- Insertion loss of each band (L<sub>c*i*</sub>)

## Step 2: Carrier/Modulation Decision



## Step 3: Output Power Selection



## Step 4: Phase Calibration for Each Band

Fig. 11. Cognitive algorithm.

this type of delay mismatch. The forwarded clock is embedded within baseband in the frequency domain and it travels with the data stream on the exactly same physical channel. This feature makes the forwarded clock capable of tracking the delay mismatch between the different channel and each physical channel having its own forwarded sampling clock.

For different frequency bands delay mismatch (within the same differential traces), more careful group delay analysis over all the used bands in multiband signaling is necessary. The main contribution comes from the channel condition and impedance matching quality. Due to the relatively low symbol rate, the group delay variance from TX on-chip circuits could be ignored. If taking the MDB channel as an example, as shown in Fig. 12(b), the worst case of group delay variance happens at the notch frequencies, which means if these notch frequency bands were used as data transmission, the eye diagram quality could not only be degraded by a large loss



Fig. 12. Delay mismatch analysis for different bands. (a) Different physical channel delay mismatch. (b) MDC channel insertion loss and group delay. (c) Low-cost cable channel insertion loss and group delay.

in magnitude response but also by large in-band group delay variance in phase response. On the contrary, the group delay is within  $\pm 100$  ps around the baseband, 3-, and 6-GHz bands. To achieve the aggregated 16 Gb/s data rate, the symbol rate within one of the subbands is only 1 Gbaud, then horizontal eye period is 1 ns, which is ten times of the worst case in-band group delay variance. The situation is also similar to another channel condition—the low-cost cable channel, as shown in Fig. 12(c). Thus, no more delay tuning function is necessary for the proposed multiband signaling architecture. However, it might be necessary if the symbol rate increases further.

## IV. CIRCUIT DESIGN

A fully differential current-mode architecture is utilized for all the circuit-level designs to suppress common mode and other even-order harmonics. It also mitigates simultaneous switching noise and supply and electromagnetic noise.

A 4-b DAC and a double-balanced mixer are combined to improve energy efficiency, as shown in Fig. 13. The 4-b DAC is current steering structure, of whose output current ranges from 20 to 950  $\mu$ A with around 100-mV peak voltage swing. A 1.2 V power supply instead of 0.9 V standard core voltage in a 28-nm CMOS is chosen to provide more linearity headroom. A capacitor is added at the DAC's output



Fig. 13. 4-b DAC and double-balanced mixer schematic.



Fig. 14. Summation block schematic.

and serves as a bandwidth limiter to alleviate the in-band IBI issue, as explained in Section III. The double-balanced mixer is composed of four passive switches so that the TX output power is proportional to the DAC's output current. The unit bias current of DAC is digitally tunable and set by the cognitive controller based on link budget calculation and energy efficiency optimization.

The summation block consists of five slices as shown in Fig. 14, for  $\frac{1}{4}$ / $\frac{1}{2}$  I/Q four RF bands and one baseband. A termination resistor with a switch is attached in series at the output to improve channel characteristic impedance matching if necessary. The block needs to sum all signals from all bands and provide broadband operation up to 8 GHz. It also needs to subtract the dc current to avoid desensitizing the RX frontend.

The two  $1 \times$ -size pMOSs mirror the differential input current and sum them to sense dc current based on equation (4).

$$I_{1\text{ynMOS}} = 0.1 \times (I_P + I_N) = 0.2 \times I_{dc} \quad (4)$$

where  $I_{1\text{ynMOS}}$  is the current in the  $1 \times$  size nMOS,  $I_P$  and  $I_N$  are the differential input currents, and  $I_{dc}$  is the input

dc current. Then,  $I_{1\text{ynMOS}}$  is copied by the  $5y$  size nMOS, in which  $I_{dc}$  is subtracted from the input current.

## V. IMPLEMENTATION AND MEASUREMENT RESULTS

A test chip comprising carrier generation, a digital baseband controller, and the tri-band frontend is fabricated in a 28-nm CMOS process and occupies the  $0.016\text{-mm}^2$  area. The data source is a 16-b parallel PRBS generator operating up to 1 GHz. A universal asynchronous RX/TX interface is utilized to configure the control register and monitor the TX operation status. As shown in Fig. 15, a commercial power detector LMX2492 EVM with a 12 b-ADC is used to detect received power through channels from 100 MHz to 10 GHz during TX frequency sweeping. Detected channel frequency response information is processed by the MachX03L FPGA board, based on which the cognitive algorithm will determine carrier frequency allocation, modulation schemes, maximum achievable data rate, and other reconfigurable parameters. Two different channel conditions are tested—a 10-in low-cost differential cable by 3M and an MDB modeled by an open-stub transmission line on the PCB. For the RX side, a broadband power splitter (WSCH 1579), downconversion mixers (MZ6310C), broadband  $90^\circ$  hybrid (KRYTAR1230), LPFs (SBLP 933), amplifiers (CRBAMP100), and HP 83460A as a local oscillator constitute an instrumental RX to coherently demodulate the TX output signal.

The frequency-domain measurement analysis is shown in Fig. 16. The first column is the TX output spectrum before the signal passes through the channel. The second column is RX input spectrum after the signal passes through the channel. The aggregated data rate here is 16 Gb/s and the baseband is configured for the clock forwarding purpose, send a half-rate clock. In Fig. 16(a), the cognitive controller learns the MDB channel information and then shapes TX spectrum based on the detected channel information. The main lobe shape is maintained pretty well after the channel. However, in Fig. 16(b), the MDB channel is replaced by a low-cost cable channel and the cognitive controller channel learning feature is disabled. If the same TX spectrum is sent out, the main lobe energy and information would be corrupted after the channel. Alternatively, in Fig. 16(c), the channel learning option is enabled and the cognitive controller chooses carrier frequency and data bandwidth based on channel information. The main lobe signal after the channel is maintained well again. Although based on two very different channel conditions, the proposed serial cognitive TX is able to learn channel information and use it to optimize configuration adaptively.

The time-domain measurement results are shown in Fig. 17. It demonstrates QPSK, 16-QAM, 64-QAM, and 256-QAM modulation I/Q constellations and eye diagrams. The forwarded clock can be directly used to sample data without the need of Phase Lock Loop (PLL)-based CDR. A  $-30$ -dB EVM is achieved and the IQ mismatch is calibrated at the RX side. The proposed cognitive tri-band TX achieved 16 Gb/s without any equalization or PLL-based CDR. The eye diagram and constellation of 256-QAM are marginal for the  $10^{-5}$  BER, which is limited by the accessible instrument noise



Fig. 15. Measurement platform.

TABLE II  
PERFORMANCE COMPARISON WITH OTHER STATE-OF-THE-ART WORKS

|                                             | VLSI'15 [6]                      | VLSI'15 [7]           | VLSI'15 [8]           | JSSC'15 [11]         | This work                         |
|---------------------------------------------|----------------------------------|-----------------------|-----------------------|----------------------|-----------------------------------|
| Technology                                  | 22nm CMOS                        | 28nm CMOS             | 65nm CMOS             | 40nm CMOS            | 28nm CMOS                         |
| Data rate/diff. pair                        | 8 Gb/s                           | 13 Gb/s               | 14 Gb/s               | 7.5 Gb/s             | 16 Gb/s                           |
| Signaling                                   | Base-band NRZ                    | Base-band NRZ         | Base-band NRZ         | Bi-band NRZ / QPSK   | Tri-band QPSK/ 16/64/256-QAM      |
| Clock Synchronization Scheme                | Forwarded-clock w/ extra channel | Embedded Clock        | Embedded Clock        | Embedded Clock       | Forwarded-clock w/o extra channel |
| Area/Lane                                   | --                               | 0.028 mm <sup>2</sup> | 0.061 mm <sup>2</sup> | 0.051mm <sup>2</sup> | 0.016 mm <sup>2</sup>             |
| Power                                       | 2.56 mW                          | 17.0 mW               | 12.5 mW               | 7.4 mW               | 14.7 mW                           |
| Efficiency                                  | 320 fJ/bit                       | 1308 fJ/bit           | 893 fJ/bit            | 990 fJ/bit           | 919 fJ/bit                        |
| Worst Channel Loss within Nyquist Freq.     | 12 dB                            | 35 dB                 | 12 dB                 | 45 dB                | 45 dB (Cable)<br>40 dB (MDB)      |
| FoM ( $\mu\text{W}/\text{Gb/s}/\text{dB}$ ) | 26.7                             | 37.4                  | 74.4                  | 22.0                 | 20.4 (Cable)<br>23.0 (MDB)        |

floor. For all the other modulation schemes, the BER is less than  $10^{-12}$ .

The BER is estimated based on the distribution of demodulated signals on the received I/Q constellation. Taking one of received signal points, errors occur when the received phasor sample falls outside a symbol boundary. Assuming the noise is Gaussian distribution, the addition of Gaussian noise creates a distribution of sample points about the mean of “ideal” symbol point. The probability density function area under the curve beyond the symbol boundary represents the probability of that type of error. The error probability can be calculated by integrating the area from the symbol boundary to minus infinity

$$P(x < a) = \int_{-\infty}^a \frac{1}{\sqrt{2\pi}\sigma^2} \exp\left[-\frac{(x - \mu)^2}{2\sigma^2}\right] dx \quad (5)$$

where  $a$  is the decision boundary,  $\mu$  is the mean value of a group of received symbol, and  $\sigma$  is the standard deviation.

The noise figures of the frontend splitter, passive mixer, LPF, and analog baseband amplifier are 6.5, 7, 1.2, and 3.5 dB, respectively. The maximal resolution of the oscilloscope is 8 b. Based on the specifications of instruments and discrete components, the maximal SNR can be measured is 31.7 dB

$$\text{Max.RXinputSNR} = 8 \times 6.02 + 1.76 - \text{NF}_{\text{LPF}} - \text{NF}_{\text{MIXER}} - \text{NF}_{\text{AMP}} - \text{NF}_{\text{SPLITTER}}. \quad (6)$$

As shown Fig. 9, the BER changed from  $10^{-4}$  to  $10^{-12}$  for 256-QAM if SNR changes from 32 to 37 dB. Consequently, the measured  $10^{-5}$  BER is a reasonable result matched with the calculation.

The die photo and power consumption breakdown are presented in Fig. 18(a) and (b), respectively. The total core area is  $0.016 \text{ mm}^2$  in the 28-nm CMOS technology with  $40 \mu\text{m} \times 300 \mu\text{m}$  for the analog frontend,  $50 \mu\text{m} \times 40 \mu\text{m}$  for



Fig. 16. Frequency-domain measurement analysis. (a) MDB channel with enabled channel learning. (b) Low-cost cable channel with disabled channel learning. (c) Low-cost cable channel with enabled channel learning.



Fig. 17. Time-domain measurement results.

digital control/data generation, and  $50 \mu\text{m} \times 40 \mu\text{m}$  for the clock generation related circuitry. The total power consumption is 14.7 mW, 34% of which is consumed in the summation block. It is the interface with the off-chip environment and handles the broadband operation up to 8 GHz. The power



Fig. 18. (a) Die photo of test chip. (b) Power consumption breakdown.

consumption of the controller is relatively small because it is only running at several tens of megahertz for the initial configuration or calibration.

Table II summarizes the comparison of silicon performance with the state-of-the-art serial interface TX. Compared with the other works, this paper achieves 16 Gb/s per differential pair with the 919-fJ/b energy efficiency and 20.4-/23.0- $\mu\text{W}/\text{Gb/s}/\text{dB}$  FoM for the low-cost cable channel and for MDB channel, respectively. The forwarded clock scheme is utilized without using the extra physical channel and extra clock IO pins. The last two rows in Table II—worst channel loss (dB) within Nyquist frequency and FoM ( $\mu\text{W}/\text{Gb/s}/\text{dB}$ )—are both related to the channel condition.

## VI. CONCLUSION

In conclusion, a tri-band cognitive TX is implemented in the 28-nm CMOS technology. It demonstrated the unique capability of the learning arbitrary channel response and adaptive modulation scheme from NRZ or QPSK to PAM-16 or 256-QAM. It achieved a 16-Gb/s data rate on the MDB and low-cost cable channel conditions without using equalization. It also utilized the source-synchronous or forwarded clock scheme without increasing the clock pin and channel number. It accomplished the best FoM of  $20.4 \mu\text{W}/\text{Gb/s}/\text{dB}$  and occupied an area of  $0.016 \text{ mm}^2$ .

## ACKNOWLEDGMENT

The authors would like to thank Dr. A. Momtaz in Broadcom Corporation for the valuable advice and TSMC for the chip fabrication. They would also like to thank M. Zhu for the help of assembling and testing in Center for High Frequency Electronics, Electrical Engineering Department of UCLA.

This material is based on the research partially sponsored by Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under agreement FA8650-15-1-7519. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFRL and DARPA or the U.S. Government.

## REFERENCES

- [1] J. Jaussi *et al.*, “A 205 mW 32 Gb/s 3-tap FFE/6-tap DFE bidirectional serial link in 22 nm CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2014, pp. 440–441.
- [2] D. G. Kam and J. Kim, “40-Gb/s package design using wire-bonded plastic ball grid array,” *IEEE Trans. Adv. Packag.*, vol. 31, no. 2, pp. 258–266, May 2008.
- [3] S. Palermo, *CMOS Nanoelectronics Analog and RF VLSI Circuits. Chapter 9: High-Speed Serial I/O Design for Channel-Limited and Power-Constrained Systems*. New York, NY, USA: McGraw-Hill, 2011.
- [4] K. L. J. Wong, H. Hatamkhani, M. Mansuri, and C. K. K. Yang, “A 27-mW 3.6-Gb/s I/O transceiver,” *IEEE J. Solid-State Circuits*, vol. 39, no. 4, pp. 602–612, Apr. 2004.
- [5] J. F. Bulzacchelli *et al.*, “A 28-Gb/s 4-tap FFE/15-tap DFE serial link transceiver in 32-nm SOI CMOS technology,” *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 3232–3248, Dec. 2012.
- [6] R. Inti *et al.*, “A 0.5-to-0.75V, 3-to-8 Gbps/lane, 385-to-790 fJ/b, bi-directional, quad-lane forwarded-clock transceiver in 22nm CMOS,” in *Proc. Symp. VLSI Circuits (VLSI Circuits)*, Kyoto, Japan, Jun. 2015, pp. C346–C347.
- [7] T. Ali *et al.*, “A 3.8 mW/Gbps quad-channel 8.5–13 Gbps serial link with a 5-tap DFE and a 4-tap transmit FFE in 28 nm CMOS,” in *Proc. Symp. VLSI Circuits (VLSI Circuits)*, Kyoto, Japan, Jun. 2015, pp. C348–C349.
- [8] S. Saxena *et al.*, “A 2.8mW/Gb/s 14Gbps serial link transceiver in 65nm CMOS,” in *Proc. Symp. VLSI Circuits (VLSI Circuits)*, Kyoto, Japan, Jun. 2015, pp. C352–C353.
- [9] S.-W. Tam, E. Socher, A. Wong, and M.-C. F. Chang, “A simultaneous tri-band on-chip RF-interconnect for future network-on-chip,” in *Proc. Symp. VLSI Circuits*, Kyoto, Japan, Jun. 2009, pp. 90–91.
- [10] Y. Kim *et al.*, “An 8Gb/s/pin 4pJ/b/pin single-T-line dual (base+RF) band simultaneous bidirectional mobile memory I/O interface with inter-channel interference suppression,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2012, pp. 50–52.
- [11] K. Gharibdoust, A. Tajalli, and Y. Leblebici, “Hybrid NRZ/multi-tone serial data transceiver for multi-drop memory interfaces,” *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3133–3144, Dec. 2015.
- [12] W.-H. Cho *et al.*, “A 5.4-mW 4-Gb/s 5-band QPSK transceiver for frequency-division multiplexing memory interface,” in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, San Jose, CA, USA, Sep. 2015, pp. 1–4.
- [13] K. Gharibdoust, A. Tajalli, and Y. Leblebici, “A 4×9 Gb/s 1pJ/b hybrid NRZ/multi-tone I/O with crosstalk and ISI reduction for dense interconnects,” *IEEE J. Solid-State Circuits*, vol. 51, no. 4, pp. 992–1002, Apr. 2016.
- [14] W.-H. Cho *et al.*, “A 38mW 40Gb/s 4-lane tri-band PAM-4/16-QAM transceiver in 28nm CMOS for high-speed memory interface,” in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, San Francisco, CA, USA, Jan. 2016, pp. 184–185.
- [15] Y. Du *et al.*, “A 16 Gb/s 14.7 mW tri-band cognitive serial link transmitter with forwarded clock to enable PAM-16/256-QAM and channel response detection in 28 nm CMOS,” in *Symp. VLSI Circuits Dig. Tech. Papers*, Honolulu, HI, USA, Jun. 2016, pp. 172–173.
- [16] A. Amirkhani *et al.*, “A 24 Gb/s software programmable analog multi-tone transmitter,” *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 999–1009, Apr. 2008.
- [17] Y. Li *et al.*, “Carrier synchronisation for multiband RF interconnect (MRFI) to facilitate chip-to-chip wireline communication,” *IEEE Electron. Lett.*, vol. 52, no. 7, pp. 535–537, Apr. 2016.
- [18] B. Razavi, *RF Microelectronics*. Englewood Cliffs, NJ, USA: Prentice-Hall, 1997.
- [19] N. Kocaman *et al.*, “A 3.8 mW/Gbps quad-channel 8.5–13 Gbps serial link with a 5 tap DFE and a 4 tap transmit FFE in 28 nm CMOS,” *IEEE J. Solid-State Circuits*, vol. 51, no. 4, pp. 881–892, Apr. 2016.
- [20] T. O. Dickson *et al.*, “A 1.4 pJ/bit, power-scalable 16×12 Gb/s source-synchronous I/O with DFE receiver in 32 nm SOI CMOS technology,” *IEEE J. Solid-State Circuits*, vol. 50, no. 8, pp. 1917–1931, Aug. 2015.
- [21] Y.-J. Kim and L.-S. Kim, “A 12 Gb/s 0.92 mW/Gb/s forwarded clock receiver based on ILO with 60 MHz jitter tracking bandwidth variation using duty cycle adjuster in 65 nm CMOS,” in *Proc. Symp. VLSI Circuits (VLSIC)*, Kyoto, Japan, Jun. 2013, pp. C236–C237.
- [22] T. O. Dickson *et al.*, “A 1.8-pJ/bit 16×16-Gb/s source synchronous parallel interface in 32 nm SOI CMOS with receiver redundancy for link recalibration,” *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1744–1755, Aug. 2016.
- [23] R. Kho *et al.*, “A 75 nm 7 Gb/s/pin 1 Gb GDDR5 graphics memory device with bandwidth improvement techniques,” *IEEE J. Solid-State Circuits*, vol. 45, no. 1, pp. 120–133, Jan. 2010.
- [24] S. Y. Kao and S. I. Liu, “A 7.5-Gb/s one-tap-FFE transmitter with adaptive far-end crosstalk cancellation using duty cycle detection,” *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 391–404, Feb. 2013.
- [25] B. Hu, Y. Du, R. Huang, J. Lee, Y.-K. Chen, and M.-C. F. Chang, “A capacitor-DAC-based technique for pre-emphasis-enabled multi-level transmitters,” *IEEE Trans. Circuits Syst. II, Express Briefs*, vol. PP, no. 99, p. 1.

**Yuan Du** (S’14) received the B.S. degree (with Hons.) in electrical engineering from Southeast University, Nanjing, China, in 2009, and the M.S. degree in electrical engineering from the University of California (UCLA), Los Angeles, CA, USA, in 2012. He is currently pursuing the Ph.D. degree with UCLA and Kneron Inc., San Diego, CA, USA, where he is involved in hardware development.

His current research interests include designs of domain-specific computing hardware accelerator, high-speed mixed-signal ICs, and CMOS RF ICs.

Mr. Du was a recipient of the Microsoft Research Asia Young Fellowship in 2008, the Southeast University Chancellor’s Award in 2009, and the Broadcom Fellowship in 2015.



**Wei-Han Cho** (S’14) received the B.S. and M.S. degrees (with Hons.) from National Tsing Hua University, Hsinchu, Taiwan, in 2008 and 2010, respectively. Since 2012, he has been pursuing the Ph.D. degree with the Electrical Engineering Department, University of California, Los Angeles, CA, USA.

His current research interests include energy-efficient dense interconnect circuits.

Mr. Cho was a recipient of the MOE Technologies Incubation Scholarship in 2013 and the UCLA EE Graduate Preliminary Exam Fellowship in 2014.



**Po-Tsang Huang** (M’11) received the B.Sc., M.Sc., and Ph.D. degrees from the Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan, in 2002, 2004, and 2010, respectively.

Currently, he is an Associate Research Fellow with the Department of Electrical and Computer Engineering, National Chiao Tung University, and a Visiting Researcher with the University of California, Los Angeles, CA, USA. His current research interests include low-power digital circuit design, embedded memory design, memory subsystem for heterogeneous SoCs, brain neural sensing microsystems, and low power SoC/SiP integrations, with particular emphasis on inter-chip/intra-chip data communications and interconnect architecture.



**Yilei Li** (S’14) received the B.S. and M.S. degrees in microelectronics from Fudan University, Shanghai, China, in 2009 and 2012, respectively.

He is currently with Kneron Inc., where he is involved in the hardware development. His current research interests include circuit and system design for emerging applications, including software-defined radio, multiband RF interconnect, and terahertz imaging systems.

Mr. Li was a recipient of the Henry Samueli Fellowship in 2012 and the Broadcom Fellowship in 2015.





**Chien-Heng Wong** received the B.S. and M.S. degrees in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 2008 and 2011, respectively. He is currently pursuing the Ph.D. degree with the University of California, Los Angeles, CA, USA.

From 2011 to 2012, he was with Faraday Technology, Hsinchu, Taiwan, as a Mixed-Signal Engineer. From 2012 to 2013, he was with NTU as a Research Assistant. His current research interests include mixed-signal circuits such as PLL, ADC, and wireline transmission.



**Jieqiong Du** (S'16) received the B.S. degree in microelectronics from Shanghai Jiao Tong University, Shanghai, China, in 2012, and the M.S. degree in electrical engineering from the University of California, Los Angeles, CA, USA, in 2014. She is currently pursuing the Ph.D. degree in electrical engineering at the same university.

Her current research interests include high-speed mixed signal circuits and I/O links.



**Yanghyo Kim** (S'10) is currently pursuing the bachelor's degree with the University of California, Los Angeles, CA, USA.

His current research interests include high-speed wireline/wireless data link, mm-wave transceiver/package design, radiometer, and radar.



**Boyu Hu** received the B.Sc. degree (Hons.) from Chu-Ko-Chen Honors College, Zhejiang University, Hangzhou, China, in 2008, the M.S. degree from the Institute of VLSI Design, Zhejiang University, in 2011, and the Ph.D. degree from the University of California, Los Angeles, CA, USA, in 2016.

In 2011, he was an RF/Mixed-Signal IC Designer with Marvell, Shanghai, China, focusing on low-power wireless transceivers. In 2013, he was a Mixed-Signal IC Design Intern with Broadcom, Irvine, CA, USA, focusing on high-precision mixed-signal audio products. His current research interests include high-speed/high-precision mixed signal integrated circuit and system, hardware mapping of complex algorithms, and its related VLSI architecture and design.



**Li Du** (S'15) received the B.S. degree in information science and engineering from Southeast University, Nanjing, China, in 2011, the M.S. degree in electrical engineering from the University of California (UCLA), Los Angeles, CA, USA. He is currently pursuing the Ph.D. degree with UCLA and Kneron Inc., where he is involved in the hardware development.

He was with the High-Speed Electronics Laboratory at UCLA, where he was in charge of designing high-performance mixed-signal circuits for communication and touch-screen systems. In 2012, he was an Intern with the Broadcom Corporation FM radio team, in charge of designing a second-order continuous-time delta-sigma ADC for directly sampling FM radios. His current research interests include high performance 3-D remote touch sensing systems.



**Chunchen Liu** (M'16) received the B.S. degree in electrical engineering from the National Cheng Kung University, Taiwan, R.O.C., in 2003. Then he began his M.S./PhD program majoring in electrical and computer engineering at the University of California, San Diego, CA, USA. During that period, he participated in joint research with UC Berkeley.

From 2007 to 2010, he was the Technical Officer of Wireless Info Tech Ltd (acquired by VanceInfo, NYSE:VIT). After that, he served as the technical team leader, managing and leading several research projects and production development, with Samsung, Mstar, and Qualcomm. He has successfully founded and co-founded several startups including Tyflong Limit, Skyvin, and Rapidbridge (acquired by Qualcomm in 2012). Currently, he is the Chair and CTO of Kneron, San Diego.

Mr. Liu was a recipient of the IBM Problem Solving Award based on the use of the EIP tool suite in 2007. Two of his papers were nominated as Best Paper Award candidates at the IEEE/ACM International Conference on Computer-Aided Design (ICCAD) in 2007 and IEEE International Conference on Compute Design (ICCD) in 2008.

**Sheau Jiung Lee** received the B.S. degree in physics from National Taiwan University, Taipei, Taiwan, and the M.S. degree in electrical engineering and the Ph.D. degree in material science from the University of Southern California, Los Angeles, CA, USA.

He was with the Hong Kong Applied Science and Technology Research Institute, Hong Kong, as the Research and Development Director, and the Founder/Co-Founder for three chip design companies. He led the design of DDR, USB, HDMI, PLL, DLL, embedded RAM, and RF silicon tuner, and was also the Architect of the system-on-chip for PC, workstation, multiprocessor server, set-top-box, and TV. He was with the Rockwell Science Center, involved in developing high-speed GaAs integrated circuits. He is currently a Research Fellow with the High Speed Electronics Laboratory at the University of California, Los Angeles. He leads the research projects to develop high-speed mixed mode circuit.



**Mau-Chung Frank Chang** (M'79–SM'94–F'96) is currently the President of National Chiao Tung University, Hsinchu, Taiwan. He is also the Wintek Chair Professor of Electrical Engineering with the University of California, Los Angeles, CA, USA. His research interests include the development of high-speed semiconductor devices and high frequency integrated circuits for radio, radar, and imaging system-on-chip applications up to terahertz frequency regime.

Dr. Chang is a member of the U.S. National Academy of Engineering, a fellow of the U.S. National Academy of Inventors, and an Academician of Academia Sinica of Taiwan. He was honored with the IEEE David Sarnoff Award in 2006 for developing and commercializing GaAs HBT and BiFET power amplifiers for modern high efficiency and high linearity smart-phones throughout the past 2.5 decades.