

# A 5.8-Gb/s Adaptive Integrating Duobinary DFE Receiver for Multi-Drop Memory Interface

Hyun-Wook Lim, Sung-Won Choi, Jeong-Keun Ahn, Woong-Ki Min, Sang-Kyu Lee, Chang-Hoon Baek, Jae-Youl Lee, Gyoo-Cheol Hwang, Young-Hyun Jun, and Bai-Sun Kong, *Member, IEEE*

**Abstract**—This paper describes a 5.8 Gb/s adaptive integrating duobinary decision-feedback equalizer (DFE) for use in next-generation multi-drop memory interface. The proposed receiver combines traditional interface techniques like the integrated signaling and the duobinary signaling, in which the duobinary signal is generated by current integration in the receiver. It can address issues such as input data dependence during integration, need for precursor equalization, high equalizer gain boosting, and sensitivity to high-frequency noise. The proposed receiver also alleviates DFE critical timing to provide gain in speed, and embed DFE taps in duobinary decoding to provide gain in power and area. The adaptation for adjusting the equalizer common-mode level, duobinary zero level, tap coefficient values, and timing recovery is incorporated. The proposed DFE receiver was fabricated in a 45 nm CMOS process, whose measurement results indicated that it worked at 5.8 Gb/s speed in a four-drop channel configuration with seven slave ICs, and the bathtub curve shows 36% open for  $10^{-10}$  bit error rate.

**Index Terms**—DRAM, duobinary signaling, equalizer, integrating decision-feedback equalizer (DFE), memory, multi-drop interface.

## I. INTRODUCTION

Emerging applications like cloud computing and data mining require high-speed low-latency access to high-volume data [1], [2]. In these applications, a multi-drop channel having multiple memory modules may be needed for time-efficient access to high-density memory data [3]. As data rate increases, this type of channel may have limitations on speeding up the communication since high capacitive load and impedance discontinuity result in bandwidth limitation with reflective post-cursor ISIs. As an example, insertion loss versus frequency graphs depending on the number of drops and slave ICs in a multi-drop channel are shown in Fig. 1.

Manuscript received May 29, 2016; revised November 10, 2016 and February 3, 2017; accepted February 11, 2017. Date of publication April 7, 2017; date of current version May 23, 2017. This paper was approved by Associate Editor Ichiro Fujimori. This work was supported by the Strategic Joint Project initiated by Samsung Electronics, by the Industrial Strategic Technology Development Program (10052653) funded by the Ministry of Trade, Industry & Energy, and by the Basic Research Program through the National Research Foundation of Korea funded by the Ministry of Education under Grant NRF-2016R1D1A1B03933605. Design tools were supported by IDEC, KAIST. (*Corresponding author: Bai-Sun Kong*)

H.-W. Lim and S.-W. Choi were with the College of Information and Communication Engineering, Sungkyunkwan University, Suwon 440-746, South Korea. They are now with Samsung Electronics, Hwasung 445-701, South Korea.

J.-K. Ahn, W.-K. Min, and B.-S. Kong are with the College of Information and Communication Engineering, Sungkyunkwan University, Suwon 440-746, South Korea (e-mail: bskong@skku.edu).

S.-K. Lee, C.-H. Baek, J.-Y. Lee, G.-C. Hwang, and Y.-H. Jun are with Samsung Electronics, Hwasung 445-701, South Korea.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2675923



Fig. 1. Insertion loss versus frequency graph depending on the number of drops and slave ICs in a multi-drop channel.

For a stubless four-drop channel, the delta gain loss between a point-to-point configuration and a four-drop configuration with seven slaves reaches 9 dB at 3 GHz. Moreover, the total gain loss of the same four-drop channel is as much as 18 dB at the same operating frequency. Existing standard such as DDR4 widely used in main memory cannot meet the performance level required for these applications since its maximum data rate is limited to be less than 3.2 Gb/s [4]. A critical issue in designing a link for these applications is how to let interface transceivers provide sufficiently high data rate for a given performance requirement without being affected by ISI and reflection noise generated by channel bandwidth limitation and imperfect termination.

It is well known that an effective way to address the issues related to multi-drop channels is to use a decision-feedback equalizer (DFE) [5]–[10], which can recover channel attenuation at receiver side by utilizing previous decision data. Unlike a feedforward equalizer [11], [12] that is sensitive to and amplifies incurring high-frequency noise, the DFE can bypass the noise [2], [3]. However, since at a very high data rate the high-frequency noise due to crosstalk and reflection can be substantial, it must be carefully considered for reliable DFE operation [13]. As a way to solve this problem, an integrating DFE can be used [13]–[17], which perform current integration before data decision to provide an effect of averaging out high-frequency noise components. The duobinary signaling, which has been traditionally used in optical communication and is recently adopted in electrical wireline communication,



Fig. 2. Integrating DFE integrator input–output waveforms as a single pulse response (a) without and (b) with feedback tap latency.



Fig. 3. Input data dependency problem of integrating DFE receiver.

is another viable solution for solving the issues related to channel bandwidth limitation [18]–[21]. Theoretically, ideal duobinary signaling provides with a data rate twice as much as the channel bandwidth because a duobinary signal, which is defined as a pair of identical cursors as a single bit binary data, has an effect of squeezing its frequency spectrum in half. The duobinary signaling can also take advantage of channel attenuation as part of its transfer function, resulting in relaxed pre-emphasis and equalizer design.

To boost the data rate of a multi-drop channel for memory application, this paper combines forgoing current integration and duobinary signaling and proposes an integrating duobinary DFE receiver [22]. A synergistic combination of these signaling methods will provide advantages like: 1) no input data dependence during integration; 2) lower equalizer gain boosting; 3) no need for precursor equalization; 4) improved robustness to high-frequency noise; 5) alleviated DFE critical timing; and 6) DFE taps embedded into duobinary decoding. Section II discusses the problems of conventional approaches. In Section III, the concept, architecture, and circuit implementation for the proposed receiver are described. Section IV presents experimental results, and conclusions are given in Section V.

## II. DEMERITS OF CONVENTIONAL SCHEMES

An integrating DFE receiver integrates incoming data, and evaluates output at the end of the integration. Since the receiver performs high-frequency noise filtering, the evaluated output can be immune to such noise and to the noise enhanced by Rx equalization. But, it has several demerits. To see them, the integrator input–output waveforms as a single pulse response without and with the feedback tap latency are depicted in Fig. 2(a) and (b), respectively. The gray waveforms in Fig. 2(a) represent the receiver input disrupted by noise and the resulting



Fig. 4. Binary versus duobinary DFE. (a) Received single pulse shaping. (b) Signal voltage levels and swing.



Fig. 5. Single pulse waveform shaping of the proposed integrating duobinary DFE receiver.

integrator output showing substantial post cursor ISIs. The black waveforms represent the DFE summer output or integrator input where the waveform is shaped by subtracting scaled previous decision data and the resulting integrator output where post cursor ISIs are well eliminated. Identical



Fig. 6. Overall architecture of the proposed integrating duobinary DFE receiver.



Fig. 7. ISI-induced input data dependency: integrator input–output waveforms for (a) conventional integrating DFE and (b) proposed integrating duobinary DFE.

waveforms when the feedback tap latency is considered are depicted in Fig. 2(b). In this case, the integration period becomes narrower due to the latency, and thus, the integrated voltage amplitudes may be lower than those in Fig. 2(a), resulting in a degraded eye opening. To avoid this problem, more equalizer gain boosting or tap weight over-emphasis may be required, causing an increased design overhead. Another problem of the integrating DFE receiver is the ISI-induced input data dependence of eye opening. Depending on the rate of data change, ISI noise is incurred to channel data in a different form as shown in Fig. 3. The upper waveforms show two extreme cases for the received data, where black waveforms are when there is no data change, resulting in no ISI, and red waveforms are when data change occurs at every cycle, resulting in a large ISI. The lower waveforms in Fig. 3 are the resulting integrated voltage levels. When the data on the channel has no change and stays at an identical logic level, there will be a large voltage developed by integration, resulting in a relatively large eye opening (black waveforms). But, when the data on the channel is changing at every cycle, the voltage level obtained by integration is quite degraded (red waveforms). This degraded eye opening due to ISI will result in an increased bit error rate. In [17], a sample-and-holder



Fig. 8. Precursor equalization. (a) Conventional integrating DFE. (b) Proposed integrating duobinary DFE.



Fig. 9. Immunity to high-frequency noise. (a) Conventional duobinary DFE. (b) Proposed integrating duobinary DFE.

was used in front of the integrator to cope with this problem, which may cause lower immunity to high-frequency noise and overheads in terms of power and area.

As explained in the previous section, the duobinary signaling has been used in applications where a band-limited channel is used as a transmission medium, which can be justified by the fact that it can reduce effective bandwidth of a signal sent through the channel. Fig. 4 compares the waveform shaping for the signals coming from a band-limited channel and associated eye diagrams for a binary and duobinary DFEs. As shown in Fig. 4(a), whereas the binary DFE (top figure) requires the first post cursor to be fully attenuated not to cause any ISI noise to the next symbol, the duobinary DFE (bottom figure) requires the first post cursor to be emphasized to have the



Fig. 10. Timing critical path. (a) Integrating speculative DFE. (b) Integrating duobinary DFE.



Fig. 11. Structure, bode plot, and simulated waveforms of duobinary equalizer.

same amplitude as the main cursor. This type of encoding can be very useful for a band-limited channel, since it can exploit the roll-off characteristic of the channel and does not require high gain boosting for equalization. But, as seen in Fig. 4(b), a duobinary signal has three valid signal levels. This behavior lets the eye height be a half of that of the binary signaling, resulting in 6 dB loss in SNR.

### III. PROPOSED INTEGRATING DUOBINARY DFE RECEIVER

#### A. Overall Architecture

This section introduces a new type of wireline receiver combining the integrating DFE and the duobinary DFE to exploit the merits and avoid the demerits described in the previous section. Unlike traditional duobinary DFE, two cursor samples constituting a duobinary signal are generated by current integration. Fig. 5 illustrates conceptually how a single binary pulse corrupted by channel reflection and ISI is shaped to become a duobinary signal via integration and equalization. With no equalization, the incoming waveform corrupted by channel noise (gray waveform in the top figure) results in the integrator output waveform (gray waveforms and circles in the bottom figure) with non-zero first post cursor (Post1). Other post cursors, Post2 to Post7, also have non-zero values. Now,

for black waveforms in Fig. 5, the main cursor is integrated during the associated Unit Interval (UI) to constitute the first cursor in a duobinary signal. Then, unlike the conventional integrating DFE (in which Post1 is fully removed), Post1 is partially removed or emphasized to let its amplitude after integration be equal to that of the main cursor to form the second cursor in the duobinary signal. At the same time, second post cursor (Post2) is adjusted to let its amplitude after integration be zero. After that, several subsequent ISIs are also removed using relevant DFE taps by forcing their integral amplitudes to be zero.

Fig. 6 shows the top-level block diagram of the proposed receiver to implement the operation described above. The transmitter and the channel medium are also briefly depicted for ease of understanding, where an ordinary binary transmitter has been assumed. The receiver is configured with a duobinary equalizer, a duobinary integrator, a tap-embedded duobinary decoder, and an adaptation loop. The gain-boosting equalization of the duobinary equalizer and the clock phase adjustment of the timing recovery loop cooperatively make the main and first post cursor amplitudes after integration identical to each other and the second post cursor amplitude after integration equal to zero. The integral amplitudes of remaining post cursors from Post3 to Post6 are also all removed by associated DFE taps. The integration operation is actually done by the duobinary integrator whose output then results in a pure duobinary signal with no subsequent post cursors. After that, the duobinary signal is decoded into the binary counterpart by the tap-embedded duobinary decoder that embeds the circuit for eliminating Post7 cursor ISI.

The proposed DFE receiver described above provides several important advantages. First of all, the current integration for generating a duobinary signal allows it to have no input data dependence. As explained earlier, the eye shape after integration for the conventional integrating DFE has the ISI-induced data dependence related to the rate of input change, which is redrawn in Fig. 7(a). In the extreme case where data is changing every cycle, the integrator output eye shape is severely closed due to ISI. On the other hand, as seen by the eye diagram of the proposed DFE in Fig. 7(b), any binary input data result in one of three valid duobinary signal levels regardless of input pattern even with substantial ISI. This is because the duobinary positive and negative levels (+1, -1) occur only when the associated binary signals



Fig. 12. Structure of integrating duobinary DFE.

retain the same logic values (black waveforms) having no ISI, and because the duobinary zero level (0) occurs only when the binary signal has a high-to-low or low-to-high transition (blue waveforms), naturally leading to an identical amplitude of zero if waveform symmetry is assumed. This feature of no data dependence always allows the maximum integrator gain regardless of input pattern, resulting in another important advantage of relieved severity of equalizer gain boosting. The facts that the proposed receiver performs a partial cancellation for the first post cursor and exploits the channel roll-off characteristic as a part of its transfer function are other good reasons for lowering the gain boosting. Lower equalizer gain boosting simplifies the design of tap circuits, and reduces overhead in terms of power and area. Another important advantage of the proposed receiver is that there is no need for precursor equalization. In the conventional integrating DFE, the timing period for integration must be carefully selected to cover major signal power area for providing a maximum integrator gain. So, some precursor amplitude inevitably occurs, as shown in Fig. 8(a). On the other hand, for the proposed integrating duobinary DFE, the integrator output must provide a pair of identical cursors to form a duobinary signal. For this to be done, the first half of the waveform needs to be integrated to generate the first main cursor, and the second half of the waveform needs to be integrated to generate the second main cursor. Hence, assuming that the waveform is symmetric, the area occupied by the precursor in the conventional integrating DFE will always be included in the first main cursor area in the duobinary DFE, as depicted in Fig. 8(b). This feature can help obviate explicit equalization for eliminating the precursor. As another merit, the proposed receiver is more robust to high-frequency noise than the conventional duobinary receiver, which is depicted in Fig. 9. For the conventional duobinary DFE, the high-frequency noise superimposed on input waveform may dictate the amplitudes of two main cursors to be different from each other [Fig. 9(a)]. Meanwhile, for the proposed integrating duobinary DFE, two main cursors



Fig. 13. Structure of duobinary integrator.

of a duobinary signal are obtained by integration, letting the high-frequency noise be low-pass filtered [Fig. 9(b)]. Another advantage of the proposed receiver is that the critical timing margin is increased. The conventional integrating speculative DFE has one-UI margin from the decision to the starting point of Post2 integration [Fig. 10(a)]. For the proposed DFE, since there is no feedback path for Post2, the critical timing path moves to the Post3 feedback [Fig. 10(b)]. Therefore, two-UI margin is achieved without sacrificing the integration time. Embedding the seventh DFE tap into the duobinary decoder also gives advantage in terms of power and area by avoiding the power-consuming current Digital-to-Analog Converter (DAC) from the DFE.

### B. Circuit Implementation

As mentioned earlier, the duobinary equalizer at the front of the proposed receiver has a role of preshaping the amplitudes of the first two post cursors. The structure of the duobinary equalizer is based on a source-degenerated continuous-time linear equalizer, which is configured as a four-stage differential amplifier with  $RC$  degeneration. Fig. 11 shows the structure of a single stage amplifier with its frequency response and simulated waveforms depending on relative values of  $R_S$  and  $C_S$ . As  $R_S$  and  $C_S$  increase, the peaking gets steeper with decreased dc gain in its frequency spectrum, resulting in



Fig. 14. Tap-embedded duobinary decoder.



Fig. 15. Structure and operation of mux-embedded comparator.

more abrupt post cursor attenuation with relatively identical precursor shape and main cursor peak level. The values of  $R_S$  and  $C_S$  are selected from the data in lookup table by the control of  $H_{1-2}$ . When Post2 cursor at integrator output is positive,  $H_{1-2}$  is adapted to be increased to have larger  $R_S$  and  $C_S$  values. When Post2 cursor is negative,  $H_{1-2}$  is decreased to have smaller  $R_S$  and  $C_S$  values, which is verified by the simulated waveforms in Fig. 11. As recognized by the operation, the role of the duobinary equalizer is not to equalize the precursor ISI but to properly shape the amplitude of first and second post cursors as a means to generate a duobinary signal with no ISI.

FIG. 12 shows the structure of the integrating duobinary DFE, in which a duobinary integrator and a tap-embedded duobinary decoder are stationed. It is configured as a half-rate architecture including two summing integrators, four mux-embedded comparators and two multiplexors for duobinary decoding, and a series of latches for providing feedback tap data. Differential signaling has been employed for providing better immunity to crosstalk, simultaneous switching noise, and common-mode (CM) noise. At the falling edge of CLK, the even data path starts current summing and integration, where the duobinary equalizer output and four feedback taps (concerning the third to sixth post cursors) are added. Since the Post2 cursor feedback path is removed, this design allows the timing-critical path to move to the third post cursor, which includes only the latencies of a latch and an integrating



Fig. 16. Structure of adaptation loop.

summer (the red line), alleviating the timing constraint of the DFE. This path starts from CLK and stretches to latch output and current DAC H3, whose latency is well within one UI. At the rising edge of CLK, the duobinary decoder compares the duobinary signal with a set of reference levels, and selects one output based on the data from the other half of the data path. The mux-embedded comparator, which will be described below, is designed to have 2UI margin (the blue line), because it may have quite long delay for the worst case eye-opening. A detailed circuit structure for the duobinary integrator is shown in Fig. 13. It sums and integrates the duobinary equalizer output and signals coming from feedback taps from H3 to H6 to generate a pair of equal amplitude signals as a duobinary data free of ISI and high-frequency noise. Additional caps ( $C_{CM}$ 's) between the clock and the outputs are to compensate the charge sharing during operation and to help CM voltage setting during the CM adaptation.

After the cancellation of the post cursor ISIs except Post7, the integrator differential output converges to one of three distinct signal levels. The duobinary high (low) level occurs when consecutive high (low) levels on binary signal happens, whereas the duobinary level zero occurs when there is a change on binary signal level. So, the duobinary signal level can be compared with a high threshold or a low threshold for being converted into the original binary signal, which is done by the tap-embedded duobinary decoder shown in Fig. 14. Our idea here is that the circuits for the equalization of the seventh post cursor ISI is embedded into the decoder. For this to be done, each of these high and low threshold voltages



Fig. 17. Timing diagram for adaptation. (a) Integrator output CM adaptation. (b) Duobinary zero level and DFE tap adaptation. (c) Timing recovery.

splits into two distinct threshold levels to be selected by the value of  $D(n - 7)$ . The integrator output is then compared with one of these four threshold levels for performing the duobinary-to-binary conversion and for eliminating the seventh post cursor ISI. It also works as a speculative DFE. So, when the output from VH+ or VH- comparator is low, the next data will be compared with VL+ or VL-. For an efficient embedding of the seventh DFE tap, each comparator comprises a mux-embedded latch-type sense amplifier where the single input tail is modified to three. Since the conventional three-stacked sense amplifier can call for larger headroom and output load capacitance that deteriorates comparator offset and delay, our mux-embedded comparator design is based on the double-tail sense amplifier having low offset and reliable CM operation [23]. By so doing, the effect of increased parasitic capacitance on the latch input as a trade-off can be minimized. The structure of the mux-embedded comparator as a modified double-tail sense amplifier is shown in Fig. 15, which has three parallel tails, one input tail and two reference tails, and operates with the input tail and one of the reference tails selected by  $D(n - 7)$ . When  $D(n - 7)$  is high (low), VH+ (VH-) is obtained using the input and positive (negative) H7 tails. A similar operation is true for the generation of VL+ and VL-. Embedding the DFE tap into the duobinary decoder allows area and power saving. The resulting binary data is fed into a series of latches in order to generate feedback data for ISI cancellation.

### C. Adaptation

The receiver presents an accurate adaptation by extracting information from the integrated duobinary signal. There are



Fig. 18. Photomicrograph of the test chip.



Fig. 19. Measurement environment.

four adaptation loops to facilitate receiver operation, which adjust the integrator output CM level, the duobinary zero level, DFE tap coefficients, and the sampling clock timing.



Fig. 20. Measured (a) single pulse response and (b) eye diagram for received input.

Fig. 16 shows the structure of the loops for performing these adaptations. CM error signal, Error for Common-mode voltage, generated by the CM Detector is used to optimize the CM level of the integrator output. CM adaptation block then allows it to track the target CM level. Timing recovery error signal, Error for DFE (ERR), generated by ERR Detector at each data transition is shared by adaptations for duobinary zero level and DFE tap coefficients as well as for timing recovery. The error signal is then filtered out and used for generating the required cost functions. Fig. 17 depicts how these adaptations are performed. The CM level adaptation is done when there is no data transition. It adjusts the tail current of the integrator using  $H_{MAIN}$  to let the integrator output CM level ( $VCM_{OUT}$ ) approach the target CM level ( $VCM_{Target}$ ) as shown in Fig. 17(a). For the CM adaptation, the CM target voltage is defined as

$$VCM_{Target} = (V_{REFP} + V_{REFN})/2 \quad (1)$$

where  $V_{REFP}$  and  $V_{REFN}$  are the reference voltages for the duobinary decoder. Since the integrator output as a duobinary signal is decoded using these reference voltages, the CM error signal can be defined as

$$ERC(n) = VCM_{OUT}(n) - VCM_{Target}. \quad (2)$$

Here,  $VCM_{OUT}$  is the CM voltage of the integrator outputs,  $V_{OUTEP}$ , and  $V_{OUTEN}$ . The CM adaptation is based on the stochastic gradient descent algorithm and can be written as

$$H_{MAIN}(n+1) = H_{MAIN}(n) + \mu_{CM} \cdot \text{sign}\{ERC(n)\} \quad (3)$$

where  $H_{MAIN}$  is the CM level control signal, and  $\mu_{CM}$  is the adaptation step size. As implied by (3), if the CM error is positive (negative),  $H_{MAIN}$  is increased (decreased), letting the integrator output CM level be decreased (increased) due to tail current increase (decrease).

Transition information and relevant error polarity are used for the adaptation of the duobinary equalizer, the integrating DFE, and the timing recovery. The error information for these adaptations is defined as

$$ERR(n) = V_{OUTEP}(n) - V_{OUTEN}(n) \quad (4)$$

where  $V_{OUTEP}$  and  $V_{OUTEN}$  are the outputs of the integrator at each sampling point, and is effective only when  $D(n-1)^T D(n) = 1$ , which means that a data transition



Fig. 21. Generation of duobinary signal at integrator output.



Fig. 22. Eye diagrams at duobinary equalizer and summing integrator outputs: under equalized (left), optimum equalized (middle), and over equalized (right).

has occurred. The duobinary zero level adaptation is used for setting the coefficients for the duobinary equalizer as well as those for the integrating DFE, as shown in Fig. 17(b). The adaptation is based on the Signed-Signed-LMS algorithm. Data transition detection is done by a digital data pattern filter whose result is used as the error function for adaptation. Since the duobinary equalizer is adapted to minimize the second post cursor ISI, the equation for the adaptation can be written as

$$H_{1-2}(n+1) = H_{1-2}(n) + \mu_{DuoEQ} \cdot \text{sign}\{ERR(n)\} \cdot \text{sign}\{D(n-2)\}. \quad (5)$$

Here,  $\text{sign}\{D(n-2)\}$  is the sign of the two-cycle previous decision data, and  $\mu_{DuoEQ}$  is the step size for adaptation. The equation for DFE coefficients adaptation can also be

TABLE I  
PERFORMANCE COMPARISON

|                            | [19]                         | [25]                     | [24]             | [15]                             | [14]                       | [10]                      | [16]                       | This work                       |
|----------------------------|------------------------------|--------------------------|------------------|----------------------------------|----------------------------|---------------------------|----------------------------|---------------------------------|
| Process                    | 90nm                         | 40nm                     | -                | 180nm                            | 180nm                      | 130nm                     | 250nm                      | 45nm                            |
| Supply(V)                  | 1.5                          | 0.9                      | -                | 1.8                              | 1.8                        | 1.2                       | 2.5                        | 1.1                             |
| Channel                    | Point-to-Point (diff.) 3.93" | 2drop (diff.) 12"        | 3drop (diff.) 6" | 2drop/4drop (single-ended) 2.59" | 2drop (single-ended) 4.65" | 4drop (single-ended) 6.3" | 4drop (single-ended) 7.54" | 4drop (diff.) (7 slaves) 4.72"  |
| Max. speed (Gb/s/lane)     | 20                           | 7.5                      | 10               | 3.8 (2-drop)<br>2.6 (4-drop)     | 3.2                        | 2.6                       | 2.0                        | 5.8                             |
| Signaling Method           | Tx                           | Duobinary (Pre-emphasis) | Multi-tone       | Multi-tone                       | NRZ (No emphasis)          | NRZ (No emphasis)         | NRZ (No emphasis)          | NRZ (No emphasis)               |
|                            | Rx                           | Duobinary                | Multi-tone (DFE) | Multi-tone (DFE)                 | NRZ (Integrating DFE)      | NRZ (Integrating DFE)     | NRZ (Integrating DFE)      | NRZ (Integrating Duobinary DFE) |
| Power @Rx (mW/Gb/s)        | 3                            | 1                        | -                | 18.94                            | 21.25                      | 42.9                      | 5                          | 2.45                            |
| Rx Area (mm <sup>2</sup> ) | 0.11                         | 0.015                    | -                | 0.18                             | 0.18                       | 0.187                     | 0.026                      | 0.087                           |



Fig. 23. Simulated adaptation procedure (top), and corresponding integrator input–output waveforms at specific time points (bottom).

written as

$$H_X(n+1) = H_X(n) + \mu_{\text{DFE}} \cdot \text{sign}\{\text{ERR}(n)\} \cdot \text{sign}\{D(n-X)\} \quad (6)$$

where  $X$  denotes DFE tap index that can have an integer value from three to six, so the taps are updated using  $D(n-3)$ ,  $D(n-4)$ ,  $D(n-5)$ , and  $D(n-6)$ .

The timing recovery is based on a duobinary symbol-rate clock recovery performed in voltage domain, in which the clock phase is to be set at the duobinary zero level crossing point. If there is no ISI in the received signal and no timing error for the sampling clock, the duobinary signal for a data transition will always be led to zero level. When the incoming

data is transitioning from high to low (from low to high), the integrator output will have a negative (positive) voltage for a lead clock phase. Hence, the clock phase for timing recovery is adjusted based on the direction of data transition and the polarity of the integrator output. Adaptation for timing recovery is performed by baud-rate operation of Muller and Muller's algorithm that uses the error at each data transition and the sign for the data. The amount of the timing error can then be written as

$$\Delta \text{PHS} = \text{sign}\{\text{ERR}(n)\} \cdot \text{sign}\{D(n) - D(n-1)\}. \quad (7)$$

$\Delta \text{PHS}$  becomes negative when the phase should be increased in order to get optimum phase as described in Fig. 17(c).



Fig. 24. (a) Measured integrator output eye opening. (b) Corresponding equalizer coefficients.



Fig. 25. Measured bathtub curves.

Since the duobinary zero data only occur at data transitions, additional information such as reference is not required to perform timing recovery.

#### IV. EVALUATION RESULTS

The proposed adaptive integrating duobinary DFE receiver was fabricated in a 45 nm CMOS process, whose microphotograph is shown in Fig. 18. As signaling configuration, a four-drop channel with seven slave ICs was used, whose structure is depicted in Fig. 19. A pair of slave ICs at each drop were modeled with two 0.9-pF capacitors, and the proposed receiver IC was attached as the seventh drop at the far end of the channel, which was terminated with on-chip resistors. PRBS7 pattern was used with no pre-emphasis in Tx attached at the near end. Fig. 20 shows the measured single pulse response and eye diagram at Rx input pin. The eye was almost completely closed, and it is clear that at least seven DFE taps are required as recognized by the single pulse response.

To investigate the operation of the proposed receiver, Fig. 21 shows a set of simulated single pulse responses at the integrator output captured at different clock phases. For ease of understanding, the waveforms depicted indicate the voltage difference between the inverting and non-inverting outputs of the integrator. An early sampling clock phase generates larger Post1 cursor integral amplitude than the main cursor integral amplitude (top waveforms) whereas a late clock phase generates smaller Post1 cursor integral amplitude (bottom waveforms). Hence, by adapting the sampling point of the clock, the main and Post1 cursors can be made to be identical to each other. The Post2 cursor is also adjusted to be zero by the adaptation of the duobinary equalizer. Actually,

when the duobinary equalizer has small (large) values of  $R_S$  and  $C_S$ , the Post2 cursor is usually positive (negative) with the main cursor lower (higher) than the Post1 cursor. Hence, as the Post2 cursor is adapted to approach zero, the Post1 cursor also becomes closer to the main cursor. To see real waveforms, simulated waveforms of the duobinary equalizer and differential integrator outputs are depicted in Fig. 22. The left and right waveforms are when the duobinary equalizer is under and over equalized, respectively. After equalization, ERR is minimized and eye opening is maximized as described by the waveforms in the middle.

The adaptation procedure for the proposed receiver is shown in Fig. 23. From  $t_1$  to  $t_3$ ,  $H_{1-2}$  of the duobinary equalizer and the phase of the sampling clock are adapted to form a duobinary signal and to remove the Post2 cursor. As can be seen by the bottom waveforms in Fig. 23, diverged integrator output voltages caused by ISI converge into three duobinary signal levels as the  $H_{1-2}$  value and the clock phase are optimized. From  $t_3$  to  $t_4$ , the adaptation for  $H_3-H_6$  is also enabled, and then, the mean-squared error decreases as the coefficients are optimized.

Fig. 24 shows measured integrator output eye openings with corresponding equalizer coefficient values. It shows no eye opening with no equalization, 240 mV opening with the duobinary equalizer only, and 410 mV opening with full implementation. The resulting measured bathtub curves are shown in Fig. 25, which indicate that 12.2% horizontal opening is obtained using the duobinary equalizer only, which is enhanced to 36% with full implementation. To see the performance migration of the proposed receiver in terms of data rate and drop count, Fig. 26 shows waveforms at the receiver input, the duobinary equalizer output, and the integrator output at various data rates and number of drops. In the figure, the waveforms for the data rate ranging from 4 to 6 Gb/s for the four-drop channel configuration, and those for the drop count ranging from four to two at 6 Gb/s data rate are depicted. As expected, for lower data rate and smaller drop count, we have larger eye opening at the duobinary equalizer output and higher voltage margin at the integrator output. Table I shows the performance comparison among the proposed and several conventional transceivers for multi-drop channels. The duobinary transceiver in [19] can operate up to 20 Gb/s, but it is working on a point-to-point channel configuration. Multi-tone approaches in [24] and [25] have relatively high data rate, but they need a precomputation on the transmitter side causing an overhead. Other approaches including the proposed one are receiver-only solution with no burden on the transmitter side. If some kind of pre-emphasis method is adopted, the data rate of these schemes can be expected to be boosted. Among these approaches, the proposed one provides maximum data rate with smaller power and area. For the proposed, total power consumption can be decomposed to be 0.405 mW for Analog-Front-End, 0.765 mW for DFE, 0.113 mW for deserializer, and 1.165 mW for Clock Data Recovery and clock unit. Note that the proposed work achieves 5.8 Gb/s data rate for seven slave ICs stationed with unmatched channel impedance at each drop and with no equalization on the transmitter side.



Fig. 26. Simulated waveforms of channel output (left), duobinary equalizer output (middle), and integrator output (right) at (a) 6 Gb/s four-drop, (b) 5 Gb/s four-drop, (c) 4 Gb/s four-drop, (d) 6 Gb/s three-drop, and (e) 6 Gb/s two-drop channel configurations.

## V. CONCLUSION

In this paper, an adaptive integrating duobinary DFE receiver is proposed for multi-drop interfaces. The proposed receiver combines the duobinary signaling and integrating DFE to provide several advantages. Experimental results in 45 nm CMOS process indicated that, in a four-drop channel with seven slave ICs, the proposed receiver can achieve up to 5.8 Gb/s operating speed.

## REFERENCES

- [1] G. Motta, N. Sfondrini, and D. Sacco, "Cloud computing: an architectural and technological overview," in *Proc. Int. Joint Conf. Service Sci.*, May 2012, pp. 23–27.
- [2] S. Chakravarthy, "Information processing: From file systems to cloud computing," in *Proc. Int. Conf. Ind. Inf. Syst.*, Dec. 2014, p. 1.
- [3] T. Chen, J. Chen, and B. Zhou, "A system for parallel data mining service on cloud," in *Proc. Int. Conf. Cloud Green Comput.*, Nov. 2012, pp. 329–330.

- [4] K. Sohn *et al.*, “A 1.2 V 30 nm 3.2 Gb/s/pin 4 Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme,” *IEEE J. Solid-State Circuits*, vol. 48, no. 1, pp. 168–177, Jan. 2013.
- [5] K. Krishna *et al.*, “A 0.6 to 9.6 Gb/s binary backplane transceiver core in 0.13- $\mu$ m CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2005, pp. 64–65.
- [6] R. Payne *et al.*, “A 6.25 Gb/s binary adaptive DFE with first postcursor tap cancellation for serial backplane communications,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2005, pp. 68–69.
- [7] W.-H. Shin *et al.*, “A DFE receiver with equalized VREF for multidrop single-ended signaling,” *IEEE Trans. Circuits Syst. II, Express Briefs*, vol. 60, no. 7, pp. 412–416, Jul. 2013.
- [8] V. Balan *et al.*, “A 4.8–6.4 Gb/s serial link for backplane applications using decision feedback equalization,” *IEEE J. Solid-State Circuits*, vol. 40, no. 9, pp. 1957–1967, Sep. 2005.
- [9] K. L. J. Wong, A. Rylyakov, and C. K. K. Yang, “A 5-mW 6 Gb/s quarter-rate sampling receiver with a 2-tap DFE using soft decisions,” *IEEE J. Solid-State Circuits*, vol. 42, no. 4, pp. 881–888, Apr. 2007.
- [10] H. Fredriksson and C. Svensson, “2.6 Gb/s over a four-drop bus using an adaptive 12-tap DFE,” in *ESSCIRC, Dig.*, Sep. 2008, pp. 470–473.
- [11] H. Wang *et al.*, “A quad multi-speed serializer/deserializer with analog adaptive equalization,” in *Symp. VLSI Circuits, Tech. Dig. Papers*, Jun. 2004, pp. 340–343.
- [12] A. A. Fayed and M. Ismail, “A low-voltage low-power CMOS analog adaptive equalizer for UTP-5 cables,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 2, pp. 480–495, Mar. 2008.
- [13] G. R. Gangasani *et al.*, “A 16 Gb/s backplane transceiver with 12-tap current integrating DFE and dynamic adaptation of voltage offset and timing drifts in 45-nm SOI CMOS technology,” *IEEE J. Solid-State Circuits*, vol. 47, no. 8, pp. 1828–1841, Aug. 2012.
- [14] H.-J. Chi, J.-S. Lee, S.-H. Jeon, S.-J. Bae, J.-Y. Sim, and H.-J. Park, “A 3.2 Gb/s 8b single-ended integrating DFE RX for 2-drop DRAM interface with internal reference voltage and digital calibration,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2008, pp. 112–113.
- [15] H.-J. Chi *et al.*, “A single-loop SS-LMS algorithm with single-ended integrating DFE receiver for multi-drop DRAM interface,” *IEEE J. Solid-State Circuits*, vol. 46, no. 9, pp. 2053–2063, Sep. 2011.
- [16] S.-J. Bae, H.-J. Chi, Y.-S. Sohn, J.-S. Lee, J.-Y. Sim, and H.-J. Park, “A 2-Gb/s CMOS integrating two-tap DFE receiver for four-drop single-ended signaling,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 8, pp. 1645–1656, Aug. 2009.
- [17] T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman, “A 12-Gb/s 11-mW half-rate sampled 5-tap decision feedback equalizer with current-integrating summers in 45-nm SOI CMOS technology,” *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1298–1305, Apr. 2009.
- [18] H. Jeffrey, M. Duelk, and A. Adamiecki, “High-speed electrical backplane transmission using duobinary signaling,” *IEEE Trans. Microw. Theory Techn.*, vol. 53, no. 1, pp. 152–159, Jan. 2005.
- [19] J. Lee, M.-S. Chen, and H.-D. Wang, “Design and comparison of three 20-Gb/s backplane transceivers for duobinary, PAM4, and NRZ data,” *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2120–2133, Sep. 2008.
- [20] S.-M. Lee *et al.*, “An 80 mV-swing single-ended duobinary transceiver with a TIA RX termination for the point-to-point DRAM interface,” *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2618–2630, Nov. 2014.
- [21] K. Sunaga, H. Sugita, K. Yamaguchi, and K. Suzuki, “An 18 Gb/s duobinary receiver with a CDR-assisted DFE,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2009, pp. 274–275.
- [22] H.-W. Lim *et al.*, “A 5.8 Gb/s adaptive integrating duobinary-based DFE receiver for multi-drop memory interface,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [23] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, and B. Nauta, “A double-tail latch-type voltage sense amplifier with 18ps setup+hold time,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2007, pp. 314–315.
- [24] W. T. Beyene and A. Amirkhani, “Controlled intersymbol interference design techniques of conventional interconnect systems for data rates beyond 20 Gbps,” *IEEE Trans. Adv. Packag.*, vol. 31, no. 4, pp. 731–740, Nov. 2008.
- [25] K. Gharibdoust, A. Tajalli, and Y. Leblebici, “A 7.5 mW 7.5 Gb/s mixed NRZ/multi-tone serial-data transceiver for multi-drop memory interfaces in 40nm CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 180–181.



**Hyun-Wook Lim** received the B.S. and M.S. degrees in electrical engineering from Korea University, Seoul, South Korea, in 1996 and 1998, respectively, and the Ph.D. degree in semiconductor and display engineering from Sungkyunkwan University, Suwon, South Korea, in 2015.

In 1998, he joined Samsung Electronics, Hwasung, South Korea, where he was involved in the design of high speed serial interface for DRAM, mobile display, and TV display application. He currently leads the design group of high speed serial interface and display enhancement IP for display driver IC and timing controller.



**Sung-Won Choi** received the B.S. degree in semiconductor systems engineering and the M.S. degree in semiconductor and display engineering from Sungkyunkwan University, Suwon, South Korea, in 2013 and 2015, respectively.

In 2015, he joined Samsung Electronics, Hwaseong, South Korea, where he was involved with the Memory Division. His current research interests include high-speed memory circuit design.



**Jeong-Keun Ahn** received the B.S. degree in electronics engineering from Ajou University, Suwon, South Korea, in 2004, and the M.S. degree in electronics and computer engineering from Hanyang University, Seoul, South Korea, in 2006. He is currently pursuing the Ph.D. degree with the Graduate School of Semiconductor and Display Engineering, Sungkyunkwan University, Suwon.

He joined Samsung SDI, in 2006, and is currently with the OLED Development Team of Samsung Display. His current research interests include high-circuit design and signal integrity.



**Woong-Ki Min** received the B.S. degree in semiconductor systems engineering and the M.S. degree in semiconductor and display engineering from Sungkyunkwan University, Suwon, South Korea, in 2015 and 2017, respectively.

In 2017, he joined Samsung Electronics, Hwaseong, South Korea, where he was involved in System LSI Business. His current research interests include high-speed CMOS interface circuit design and signal integrity.



**Sang-Kyu Lee** received the B.S. and M.S. degrees in electronic engineering from Kyung Hee University, Seoul, South Korea, in 2004 and 2006, respectively.

Since 2006, he has been with the Display Driver IC Development Team, Samsung Electronics Co. Ltd, Hwasung, South Korea, where he is currently a Senior Engineer and developing Physical Layer and link layer digital architecture of high speed interface such as Mobile Industry Processor Interface (MIPI) and inter-panel interface. He has been contributing the MIPI DSI specification enactment since 2013.



**Chang-Hoon Baek** received the B.S., M.S., and Ph.D. degrees in electrical engineering from the Pohang University of Science and Technology, Pohang, South Korea, in 1999, 2001, and 2006, respectively.

Since 2006, he has been developing display driver ICs for mobile application and high-speed serial interfaces for various display applications. He is currently a Principal Engineer of System LSI Business with Samsung Electronics. His current research interests include analog circuit design and Elector-Static Discharge/Electrical Fast Transient analysis.



**Jae-Youl Lee** received the Ph.D. degree in materials science and engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 1999.

From 1999 to 2003, he was with the DRAM Design Group, Hynix Semiconductor, Seoul, South Korea, where he designed a family of SDRAMs. In 2003, he joined KAIST, as a Research Professor, where he was involved in the digital hearing aids and charge pump circuits. He joined Samsung Electronics, South Korea, and has been

leading the developments of high-speed serial interfaces for display driver ICs since 2004. He is currently a Master of System LSI Business with Samsung Electronics. His current research interests include the high speed serial interface for display applications.

Dr. Lee was the Far-East Regional Sub-Committee Chair of the International Solid-State Circuits Conference in 2014.



**Gyoo-Cheol Hwang** received the B.S. degree in electronics engineering from KyungPook National University, Daegu, South Korea, in 1987, and the master's and Ph.D. degrees in electronic engineering from the Advanced Institute of Science and Technology, Daejeon, South Korea, in 1989 and 1994, respectively.

In 1994, he joined Samsung Electronics Co., Ltd., as a Very large Scale Integration (VLSI) Research and Development Engineer, and was involved in VLSI and analog application area for more than 20 years. He is currently a General Manager with the System LSI Business Unit, Samsung Electronics Co., Ltd.



**Young-Hyun Jun** received the M.S. and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology, Seoul, South Korea, in 1986 and 1989, respectively.

From 1989 to 1991, he was a Research Associate with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Champaign, IL, USA, where he was involved in the design of analog circuits. Since 2014, he has been the President with the Memory Business Division, Samsung Electronics, South Korea. He has authored many technical papers and holds many patents related to semiconductor design. His current research interests include the development of high-speed DRAMs and I/O interface, flash memories, low-power very large scale integration circuits, and various analog circuit designs.



**Bai-Sun Kong** (M'14) received the B.S. degree in electronics engineering from Yonsei University, Seoul, South Korea, in 1990, and the M.S. and the Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology, Daejeon, South Korea, in 1992 and 1996, respectively.

From 1996 to 1999, he was a Senior Design Engineer with LG Semicon (currently SK Hynix Semiconductor), Seoul, where he was involved in the design of high-density and high-bandwidth DRAMs. In 2000, he joined the School of Electronics Telecommunication and Computer Engineering, Korea Aerospace University, Goyang, South Korea, as an Assistant Professor. In 2005, he joined Sungkyunkwan University, Suwon, South Korea, where he is currently a Professor with the College of Information and Communication Engineering. His current research interests include high-performance microprocessor/memory architecture and circuit designs, high-speed transceiver design, neuromorphic integrated circuit design, and IC designs for low-power and high-speed applications.