

# Simultaneous Bidirectional Signaling for Die-to-Die Links: Signal Integrity Challenges and Hybrid Circuits

DURAND JARRETT-AMOR<sup>1</sup> (Member, IEEE), AND TONY CHAN CARUSONE<sup>1,2</sup> (Fellow, IEEE)  
(Invited Paper)

<sup>1</sup>Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 1A1, Canada

<sup>2</sup>Alphawave Semi Inc., LS1 4DL Leeds, U.K.

CORRESPONDING AUTHOR: T. C. CARUSONE (e-mail: tony.chan.carusone@isl.utoronto.ca)

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC).

**ABSTRACT** This article reviews simultaneous bidirectional signaling and its unique signal integrity challenges for die-to-die links: the near-end echo signal and the hybrid circuit required to remove it, signal reflections, and the impact of timing. A few key works in the design of simultaneous bidirectional transceivers are covered, such as dynamic reference-switching, the replica driver, and the split-termination hybrid, followed by a survey of recent simultaneous bidirectional transceivers for die-to-die links. Finally, we present our own split-termination, passive hybrid simultaneous bidirectional transceiver as a low-power alternative for die-to-die links.

**INDEX TERMS** Bandwidth, chiplets, die-to-die, full-duplex, heterogeneous integration, NRZ, simultaneous bidirectional, single-ended.

## I. INTRODUCTION

AS THE cost of manufacturing large silicon die in advanced technology nodes increases, putting more functionality into a single monolithic IC is becoming more costly [1], [2], [3], [4]. Furthermore, as transistor sizes shrink, they become more sensitive to electro-migration, aging, and quantum effects [5]. As a result, heterogeneous integration has become a popular solution to these problems, where a large monolithic IC is “chopped up” into small “chiplets” (small die), where only the necessary portions of the system are implemented in advanced logic process technologies. The chiplets are integrated together in a single package to form a system-in-package (SiP).

Although the concept of heterogeneous integration is not new (multichip-modules that integrate die onto a single substrate have been around since the 1970s [7]), heterogeneous integration via chiplets is a relatively new idea in the semiconductor industry. Indeed, there has been a lot of activity and announcements in the industry since 2015, where some companies have licensed their chip-to-chip technologies to other companies for use in multichip applications, whereas other companies started using chiplets in processors for the

server and embedded systems market [3], [8], [9]. Chiplets also allow for a silicon SiP that is greater than the maximum reticle size [6]. Even if it is possible to fit the required logic, memory, and I/O circuits within the reticle limit, the cost of implementing a large SiP out of chiplets is lower [2], [3]. The savings arise because using chiplets to implement a large SiP improves yield [12]. Furthermore, chiplets for compute-intensive applications like AI, high-performance computing, and networking require high bandwidth, power efficient die-to-die (D2D) links, like the ones shown in Fig. 1 [11].

An important metric for such links is the beachfront density, which is defined as the aggregate TX bandwidth plus the aggregate RX bandwidth divided by the length of the edge of the die (in mm) that is occupied by the D2D interface. Typical beachfront densities for advanced (silicon interposer) and standard (organic) packages are 1.317-TB/s/mm and 224-GB/s/mm, respectively, with per-lane data rates up to 16-Gb/s for advanced packages and 32-Gb/s for standard packages [10]. Another key quantity for D2D links is power efficiency in pJ/bit, where for a single transceiver power efficiencies <0.6-pJ/bit and <1.25-pJ/bit for advanced and organic packages, respectively, are required [10]. BER



**FIGURE 1.** Example of a SiP using (a) a standard package and (b) an advanced package for connecting chiplets together.

is another critical performance specification for D2D links, where BER requirements  $<1E-15$  (without forward error correction due to the limited power and latency budget) need to be satisfied for data rates up to 32-Gb/s [10].

However, to keep up with future bandwidth requirements the per-lane data rates for USR D2D links will have to increase [13]. This is because the rate at which bandwidth demands grow typically outpace the rate at which more I/O bumps can be added to a package, especially when using advanced packages such as silicon interposer [14]. However, increasing the data rate means the D2D links will operate at higher insertion losses, which can lead to increased power consumption if equalization is required. The insertion loss is even higher for advanced packaging technologies, such as silicon interposers, compared to organic packages because of their highly resistive interconnects, as a result of their thin traces [15], [16]. Furthermore, if the chiplet is bump-limited (typically the case for organic packages), then the pin efficiency (number of bits sent plus received per pin) of the I/O for the D2D link becomes a critical issue. Therefore, alternative signaling solutions will need to be investigated in order to solve these problems. One possible solution is single-ended simultaneous-bidirectional (SBD) NRZ signaling.

Single-ended SBD NRZ signaling would improve the pin efficiency of a D2D link by  $2\times$  compared to single-ended unidirectional (UD) NRZ signaling by transmitting and receiving data simultaneously (full-duplex) over the same channel, resulting in 200% pin efficiency of the D2D interface. Furthermore, a single-ended SBD D2D link could operate at a lower data rate in each direction compared to a single-ended UD D2D link while achieving the same aggregate bandwidth (TX + RX bandwidth), which allows the SBD transceiver to operate at a lower insertion loss, reduce the impact of ISI, and save power [61].

The aim of this article is to provide an overview of SBD NRZ signaling for D2D links, so as to identify some of its

signal integrity challenges and to present various approaches to overcoming them. Section II discusses the signal integrity issues in SBD signaling. Section III reviews various SBD architectures in the literature that attempt to overcome these challenges. Section IV presents the design and simulation of a low-power SBD transceiver for D2D links, and Section V concludes this article.

## II. SIGNAL INTEGRITY CHALLENGES IN SBD SIGNALING FOR D2D LINKS

D2D links use single-ended signaling to provide a more efficient use of the I/O bumps that are available on a die. Consequently, crosstalk and signal reflections are some of the key design constraints. In addition to this, SBD signaling for D2D links will have its own unique signal integrity challenges. In this section, we will discuss signal integrity challenges specific to D2D links, then compare single-ended UD signaling to single-ended SBD signaling, after which we will examine the signal integrity challenges in SBD signaling.

### A. SIGNAL INTEGRITY CHALLENGES FOR D2D LINKS

D2D links have signal integrity requirements that are quite different than those for chip-to-chip (C2C) links over, for example, printed circuit board (PCB). This is a result of the fact that the traces in a D2D link only traverse a path from one die to another die within the same package, as shown in Fig. 1, where the trace lengths are typically  $<25\text{-mm}$  [10]. In contrast, C2C links over PCB may traverse a path from one packaged chip on a line card to another packaged chip on a separate line card via a PCB trace that can be up to tens-of-inches in length [58]. Therefore, the insertion loss for D2D links is not as severe compared to C2C links. This is highlighted in Fig. 2, which compares the insertion loss and pulse response of a 5-mm D2D link in organic substrate against that of a 12-inch C2C link via a backplane channel. As seen in Fig. 2(a) and (b), the insertion loss of the D2D link at 12-GHz is only  $-0.75\text{-dB}$  compared to  $-35\text{-dB}$  for the C2C link, resulting in significant attenuation and ISI for the pulse response of the C2C link compared to the D2D, as shown in Fig. 2(c) and (d).

On the other hand, crosstalk is a significant concern for D2D links due to the tight spacing between I/O bumps as a result of the need for high aggregate bandwidths ( $>1\text{-Tb/s}$ ), where the bump pitch can be  $110\text{-}\mu\text{m}$  for organic packages and  $45\text{-}\mu\text{m}$  for advanced packages [10]. Such tightly spaced links result in crosstalk being one of the main signal integrity constraints for D2D links, where the power-sum of the crosstalk can be as high as  $24\text{-dB}$  at Nyquist at 16-Gb/s [10]. In addition to crosstalk, signal reflections play an important role in the signal integrity of D2D links. Due to the low loss of D2D links, signal reflections will not be attenuated as much causing an increase in ISI and data-dependent jitter [60]. However, the termination of a D2D link depends on the package substrate that is used as well as the data rate and reach of the link [10]. For example,



**FIGURE 2.** Top: (a)  $S_{21}$  plot of a 5-mm D2D link in organic substrate. (b)  $S_{21}$  plot of a 12-inch C2C link over PCB [59]. Bottom: (c) Pulse response of a 5-mm D2D link in organic substrate at 24-Gb/s. (d) Pulse response of a 12-inch C2C link over PCB at 24-Gb/s [59].

organic packages may be unterminated at the RX for channel lengths up to 25-mm for data rates <4-Gb/s if the transmitter swing is large enough (>0.85-V) [10]. At higher data rates, for example 16-Gb/s, and full transmitter swing, the RX can have no termination for channel lengths <5-mm [10]. In contrast, advanced packages may require no termination at the RX for all data rates and channel lengths [10]. Furthermore, the impedance at the TX and RX in D2D links need not match the characteristic impedance of the channel in order to increase the bandwidth of the link, reduce its power consumption, or to satisfy the high-level jitter budget [15]. Similar to how channel termination impacts the signal integrity of a D2D link via signal reflections, the pad capacitance of a D2D link can also degrade its signal integrity by limiting the rise and fall times at the TX output, and RX input. The value of the pad capacitance varies with the speed and package substrate of the D2D link, with values <250-fF for all data rates for advanced packages and <125-fF for 32-Gb/s data rates for organic packages [10].

#### B. SBD VERSUS UD SIGNALING FOR D2D LINKS

A  $2\times$  improvement in pin efficiency can be achieved for a single-ended link, if the I/O for a given channel can simultaneously transmit and receive data on the shared channel, i.e., via SBD (full-duplex) signaling. Thus, SBD signaling can achieve the same bandwidth per bump as UD signaling while operating at half of the Nyquist frequency. Operating at half of the Nyquist frequency not only reduces ISI but it can also reduce the impact of crosstalk and it can save power in the clocking circuitry.

However, SBD signaling experiences unique signal integrity challenges because of the echo signal at the input to the receiver due to the near-end transmitter, also known



**FIGURE 3.** (a) Worst-case eye diagram for UD signaling from (1). (b) Worst-case eye diagram for SBD signaling from (2). Both signaling schemes were compared at the same data rate.

as near-end echo. First, consider a unidirectional link with symbol interval  $T_{\text{bit}}$  and a pulse response  $p_U^{(d)}(t)$  for the transmitted symbol  $d \in \{0, 1\}$ . Knowing the worst-case ISI sequence for a transmitted “1” pulse  $d_k$ , a peak-distortion analysis [57] gives the worst-case 1 received pulse waveform at the input to a receiver

$$s_U^{(1)}(t) = \overbrace{y_U^{(1)}(t)}^{\text{main cursor}} + \overbrace{\sum_{k=-\infty, \neq 0}^{\infty} p_U^{(d_k)}(t - kT_{\text{bit}})}^{\text{ISI}} \Big|_{p_U^{(d_k)}(t - kT_{\text{bit}}) < 0}. \quad (1)$$

For SBD signaling, near-end echo and far-end reflection signal terms arise due to the near-end transmitter operating on the same physical wire and due to far-end impedance mismatch. Therefore, for an echo and signal reflection pulse response  $e^{(u_k)}(t)$ , and  $r^{(u_k)}(t)$ , respectively, with a worst-case near-end transmitter symbol sequence  $u_k \in \{0, 1\}$  for a received 1, the worst-case 1 received pulse waveform becomes

$$\begin{aligned} s_B^{(1)}(t) = & \overbrace{y_B^{(1)}(t)}^{\text{main cursor}} + \overbrace{\sum_{k=-\infty, \neq 0}^{\infty} p_B^{(d_k)}(t - kT_{\text{bit}})}^{\text{ISI}} \Big|_{p_B^{(d_k)}(t - kT_{\text{bit}}) < 0} \\ & + \overbrace{\sum_{k=-\infty}^{\infty} e^{(u_k)}(t - kT_{\text{bit}})}^{\text{Near-end TX echo signal}} \Big|_{e^{(u_k)}(t - kT_{\text{bit}}) < 0} \\ & + \overbrace{\sum_{k=-\infty}^{\infty} r^{(u_k)}(t - kT_{\text{bit}})}^{\text{Far-end signal reflection due to near-end TX}} \Big|_{r^{(u_k)}(t - kT_{\text{bit}}) < 0}. \end{aligned} \quad (2)$$

The echo and far-end signal reflection terms in (2) due to the near-end TX and far-end impedance mismatch introduce unique signal integrity impairments for SBD signaling, resulting in reduced eye height compared to UD signaling, as shown in Fig. 3. Therefore, these additional impairments must be limited by: 1) minimizing the echo response,  $e^{(u_k)}(t)$  and 2) ensuring proper termination of the link.

#### C. SIGNAL INTEGRITY CHALLENGES OF SBD SIGNALING

Having compared UD and SBD signaling, we now focus solely on the signal integrity issues of SBD signaling for



FIGURE 4. Simplified model of an SBD link.

D2D links, of which the dominant signal integrity concern is the near-end echo signal term in (2). As such, a hybrid circuit is needed that separates the desired far-end transmitted signal from the interference signal due to the near-end TX, as shown in Fig. 4.

Referring to Fig. 4, it can be shown that the transfer function for the echo response term,  $H_e(s)$ , in (2) is given by

$$H_e(s) = H_{13}(s) - H_{12}(s)H_{24}(s) \quad (3)$$

so that the near-end echo signal at the hybrid output,  $v_{ne,ech}(t)$ , is given by

$$v_{ne,ech}(t) = v_{ne,int}(t) \otimes h_e(t). \quad (4)$$

Therefore, the near-end echo signal in (2) can be removed if the transfer functions of the hybrid satisfy the following relationship:

$$H_{13}(s) = H_{12}(s)H_{24}(s). \quad (5)$$

Note that (3) can be obtained from a scattering matrix (S-matrix) representation of the hybrid by turning off the far-end TX and evaluating the signal at the output of the hybrid due to the near-end TX with the channel, hybrid, and TX on the other side of the channel as a load, as shown in Fig. 5. However, the details of how (3) is derived from an S-matrix representation of the hybrid are beyond the scope of this work. Nonetheless, (5) can be used to synthesize a hybrid circuit for SBD signaling. Indeed, many of the hybrids to be discussed in Section III can be shown to satisfy (5).

Satisfying the criteria in (5) ensures that the response from the near-end TX in Fig. 5 is the same from port 1 to both port 3 and port 4 of the hybrid at all frequencies, resulting in its perfect cancellation in the hybrid's differential output,  $v_{hyb}$ . Satisfying the criteria in (5) at all frequencies is very difficult in practice due to the different loads seen at port 2 and ports 3, and 4, so, as a result, a residual near-end echo signal will appear at the hybrid output, as shown in Fig. 4. Therefore, much research in SBD signaling has been focused on minimizing this echo signal.

However, unlike the near-end echo signal, the signal reflection term in (2) due to far-end impedance mismatch cannot be eliminated by minimizing the transfer function for the echo response term, since the signal reflection will see a different transfer function through the hybrid. This signal reflection term becomes particularly important in D2D links, since the low insertion loss of the links results in

small attenuation of these reflected signals. Therefore, most research in SBD signaling for D2D links has been centered on reducing the signal reflection term in (2) by minimizing the far-end impedance mismatch or using correlation-based echo cancellation [18], [21], [28].

Before discussing the prior art, we should mention another unique signal integrity issue to SBD signaling and that is the issue of timing, which has been discussed in [43], [46], [48], and [51]. Referring to Fig. 4, we note that the received signal at the output of the hybrid,  $v_{hyb}$ , on each die can be written as a superposition of the far-end recovered signal,  $v_{fe,rec}(t)$ , and the near-end echo signal,  $v_{ne,ech}(t)$ , due to the near-end TX

$$v_{hyb_A}(t) = v_{fe,rec_A}(t) + v_{ne,ech_A}(t) \quad (6)$$

$$v_{hyb_B}(t) = v_{fe,rec_B}(t) + v_{ne,ech_B}(t). \quad (7)$$

Since SBD transceivers for D2D links are mesosynchronous, but operate at the same data rate, and use clock forwarding, then assume the same clock frequency on both die A and die B in Fig. 4 but with a relative phase lead,  $\Delta t$ , for the TX clock on die B relative to die A. This will result in the transmitted signal on die A being delayed by  $\Delta t$  with respect to the transmitted signal on die B, so that the desired signal at the output of the hybrid on die B (the signal sent from die A) is also delayed by  $\Delta t$  (neglecting the delay of the channel and the delay through the hybrid). At the same time, this will also result in the near-end echo signal on die A being delayed by  $\Delta t$  as well. Therefore, (6) and (7) can be modified to include the phase delays

$$v_{hyb_A}(t) = v_{fe,rec_A}(t) + v_{ne,ech_A}(t - \Delta t) \quad (8)$$

$$v_{hyb_B}(t) = v_{fe,rec_B}(t - \Delta t) + v_{ne,ech_B}(t) \quad (9)$$

where (9) can also be expressed as

$$v_{hyb_B}(t) = v_{fe,rec_B}(t) + v_{ne,ech_B}(t + \Delta t). \quad (10)$$

Therefore, the near-end echo signal on dies A and B are shifted by  $\Delta t$  and  $-\Delta t$ , respectively, with respect to the far-end recovered signal in each receiver. This implies that the signal integrity on each die will have a different dependence on  $\Delta t$ . Regardless of the selection of the receiver clock phases on dies A and B, the eye-opening on die A could be greater or less than the eye-opening on die B depending on the timing location of  $v_{ne,ech_A}$  and  $v_{ne,ech_B}$ , respectively. Thus, the eye-opening at both hybrid outputs should be considered in determining an optimal relative TX clock phase. This is shown in Fig. 6(a).

Now, if we instead consider an additional delay is introduced into the channel,  $t_d$ , we can rewrite (6) and (7) as

$$v_{hyb_A}(t) = v_{fe,rec_A}(t - t_d) + v_{ne,ech_A}(t) \quad (11)$$

$$v_{hyb_B}(t) = v_{fe,rec_B}(t - t_d) + v_{ne,ech_B}(t). \quad (12)$$

Note that in this case, the far-end recovered and near-end echo signals at the output of the hybrid on each die experience the same time offset,  $t_d$ , due to the additional



FIGURE 5. S-matrix representation of the hybrid in Fig. 4.

FIGURE 6. Impact of (a) time offset,  $\Delta t$ , between TX clocks on each die in Fig. 4 and, (b), channel delay,  $t_d$ .

channel delay. This impacts the optimal sampling phase of the receiver clock on each die. Combining (8)–(12), we can write expressions for the waveforms at the hybrid output in the presence of both a phase lead in the TX clock on die B and an additional channel delay

$$v_{hyb_A}(t) = v_{fe,rec_A}(t - t_d) + v_{ne,ech_A}(t - \Delta t) \quad (13)$$

$$v_{hyb_B}(t) = v_{fe,rec_B}(t - t_d) + v_{ne,ech_B}(t + \Delta t). \quad (14)$$

Evidently, the impact of the delay (hence, length) of the interconnect between the two SBD transceivers cannot be

undone on both ends of the link simultaneously simply by varying the relative phases of TX or RX clocks. Thus, some channel lengths will result in echo peaks that impair signal integrity particularly badly, irrespective of the choice of TX and RX clock phases. For example, if  $t_d = T_{bit}/2$  and we sample the received signal on die A ( $v_{fe,rec_A} + v_{ne,ech_A}$ ) in Fig. 6(b) at  $T_{bit}/2$ , where the near-end echo signal has its negative peak, then we would not be sampling at the peak of the superimposed signal at the hybrid output. Adjusting the relative TX clock phases will realign the echo and recovered signals differently on dies A and B, so there is no guarantee this will help both. Given the uncertainty and variability in the trace lengths in D2D interfaces, it is therefore essential to ensure the echo signals are very small through the hybrid.

Aside from minimizing the amplitude of  $v_{ne,ech}$ , the impact of the near-end echo signal can also be reduced by use of (13) and (14). For example, given that (5) cannot be satisfied at all frequencies in practice, then a residual near-end echo signal will appear at the hybrid's output. Furthermore, given that an SBD link is mesosynchronous, then the near-end echo signal will appear at the hybrid output with a relative phase offset  $\Delta t$ , i.e.,  $v_{ne,ech}(t \pm \Delta t)$ , with respect to the far-end recovered signal which is delayed by the channel, i.e.,  $v_{fe,rec}(t - t_d)$ . As shown in Fig. 7(a), this can result in the negative peak of the far-end recovered signal occurring at the peak of the near-end echo signal, thus reducing the amplitude of  $v_{hyb}$  at the RX clock sampling time,  $t_s$ . However, if the phase offset,  $\Delta t$ , and channel delay,  $t_d$ , are known, then the near-end echo signal (and/or the far-end recovered signal) can be time-shifted by an amount  $\pm \Delta t$  so as to place the negative peak of  $v_{ne,ech}$  at the edge of  $v_{fe,rec}$ , resulting in a larger amplitude of  $v_{hyb}$  at  $t_s$ , as shown in Fig. 7(b). In fact, this approach was used in the design of the hybrid in [26].

### III. REVIEW OF PRIOR ART

Having presented the signal integrity challenges for SBD signaling for D2D links, we now discuss some prior art in SBD transceivers. We will begin by covering three prior art



**FIGURE 7.** (a) Sample pulse responses for  $v_{hyb}$ ,  $v_{fe,rec}$ , and  $v_{ne,ech}$  with  $t_d = T_{bit}$  and  $\Delta t = T_{bit}/2$ . (b) Pulse responses from (a) but with  $v_{ne,ech}$  delayed by an additional  $\Delta t = T_{bit}/2$  to shift its negative peak to the edge of  $v_{fe,rec}$ .

**TABLE 1.** Values of REF A and REF B for different combinations of data A and data B.

| Data A | Data B | REF A       | REF B       | Line       |
|--------|--------|-------------|-------------|------------|
| 0      | 0      | $V_{CC}/4$  | $V_{CC}/4$  | $V_{SS}$   |
| 0      | 1      | $V_{CC}/4$  | $3V_{CC}/4$ | $V_{CC}/2$ |
| 1      | 0      | $3V_{CC}/4$ | $V_{CC}/4$  | $V_{CC}/2$ |
| 1      | 1      | $3V_{CC}/4$ | $3V_{CC}/4$ | $V_{CC}$   |

SBD transceivers that were designed for long-reach links, each with a different approach to removing the near-end interference signal. Then, we will cover three recent works with a specific focus on D2D links.

## A. DYNAMIC REFERENCE SWITCHING HYBRID

The work in [43] used dynamic reference switching to dynamically adjust the reference level used for the downstream differential amplifier via a 2:1 MUX that is controlled by the outbound data signal, as shown in Fig. 8. In particular, the reference level on die A, REF A, is selected according to whether the outbound data is a 0 or 1. Similarly for the reference level on die B. This is shown in Table 1 which depicts the states of data A, data B, REF A, REF B, and the signal on the line for all possible bit combinations.

The dynamic reference switching scheme essentially provides a way to digitally decode the signal on the line, where the reference levels are generated on-chip and shared via two separate channels. This is done to track any power supply



**FIGURE 8.** Dynamic reference-switching SBD transceiver used in [43].

noise on each chip. Furthermore, the reference impedances,  $Z_1$  and  $Z_2$ , are selected so that the Thevenin equivalent circuit of the reference generator had a resistance equal to  $Z_0$ , and a Thevenin voltage equal to the desired reference level. This was done to ensure the noise from the transmitter appears at both inputs of the receiver as a common-mode signal.

There are several challenges with the approach used in [43]. First, matching the delay from the output of the predriver to the 2:1 MUX with the delay from the predriver to the output driver is very challenging, especially at high speed. Second, any kickback noise from the 2:1 MUX after a switching event will appear on the reference input to the amplifier, thus possibly causing decision errors. Furthermore, the delay of the 2:1 MUX becomes an issue for multi-Gb/s operation, since it needs to be much less than 1 UI in order to correctly decipher the signal on the line. Finally, sharing the reference voltages across many lanes (100, for example) becomes challenging at very high speed (+10-Gb/s). As such, a group of lanes would have to have their own reference generators, resulting in an area penalty because of the two extra channels required for the reference generators.

Compared to more recent works in SBD transceivers, the dynamic reference switching technique is the least popular of the hybrid architectures because of the challenge of operating at +10-Gb/s speeds. The only other works to use this method were [42], [45], [47], [48], [50], and [54], where each design worked at full-duplex data rates <10-Gb/s.

## **B. REPLICA DRIVER HYBRID**

In [51], a scaled replica driver approach was used to remove the near-end interference signal by reproducing it via a replica driver and subtracting this signal from the signal on the line, as shown in Fig. 9. The approach relies on producing a replica,  $v_{rep}$ , of the near-end interference TX signal (or outbound waveform),  $v_{ne,int}$ , and subtracting it from the signal on the line, which includes the desired signal (or inbound waveform) from the far-end TX,



**FIGURE 9.** Replica driver SBD transceiver used in [43]. Showing a single-ended implementation for simplicity.

$v_{fe,des}$ . These are combined to form a differential received waveform

$$v_{hyb} = (v_{fe,des} + v_{ne,int}) - v_{rep}. \quad (15)$$

Therefore, recovery of the far-end desired signal,  $v_{fe,des}$ , requires  $v_{rep} = v_{ne,int}$ , as expected. The differential amplifier was implemented as a switched-capacitor circuit that sampled both the replica driver output and the signal on the line, followed by a clocked sense amplifier that regenerates the desired signal to full-rail, as shown in Fig. 9.

The main advantage of this work compared to [43] is that it does not require two extra channels for reference signals. Furthermore, it avoids the timing issues associated with the 2:1 MUX used in [43]. However, there are still a few drawbacks of this work. First, matching the output of the replica driver with the output of the near-end driver, that is making  $v_{rep} = v_{ne,int}$ , is quite challenging. Unlike the transmit driver, the replica driver is not loaded by the channel. Therefore, matching these two signals at high frequency becomes a challenge, resulting in high-frequency signal content from the near-end transmitter leaking into the receiver signal path. Second, the use of a replica driver results in additional power and area. Finally, generating the required sampling phases and ensuring that the signals at the input to the switched capacitor can be successfully sampled becomes a significant challenge at very high (+10-Gb/s) data rates.

Among the hybrid architectures, the replica driver approach is the most popular because of its simple architecture and ability to operate at +10-Gb/s speeds. Indeed, the designs in [17], [18], [19], [20], [22], [23], [24], [25], [26], [27], [28], [30], [31], [36], [38], [49], [52], [53], and [55] used this technique for removal of the near-end TX signal at the hybrid output.

### C. SPLIT-TERMINATION, $R - g_m$ HYBRID

To remove the near-end transmitter signal, the work in [41] used a differential split-termination,  $R-g_m$  hybrid, as shown in Fig. 10.



**FIGURE 10.** Split-termination,  $R - g_m$  hybrid transceiver used in [41]. Note that a single-ended version is shown for simplicity, whereas the actual circuit is fully differential.

The split-termination,  $R-g_m$  hybrid approach overcomes the disadvantages in [43], particularly the high-frequency mismatch between the main driver and the replica driver. It does this by using a current-sensing resistor,  $R$ , such that the output of the near-end current-mode transmitter,  $i_t$ , is split across  $Z_0 - R$  and  $R$  via currents  $i_1$  and  $i_2$ , respectively, to generate voltage signals  $v_x$  and  $v_y$  at nodes  $X$  and  $Y$ , respectively. The voltages  $v_x$  and  $v_y$  are then weighted by transconductors  $G_{m1}$  and  $G_{m2}$ , respectively, producing currents  $i_x$  and  $i_y$ , which are subtracted from each other and passed through resistor  $r$  to (ideally) remove the near-end interference signal. The voltage due to the near-end transmitter at the output of the hybrid (across resistor  $r$ ),  $v_{ne,ech}$ , is

$$\begin{aligned} v_{ne,ech} &= (i_x + i_y)r \\ &= (G_{m1}v_x + G_{m2}v_y)r. \end{aligned} \quad (16)$$

To ensure  $v_{ne,ech} = 0$  at the output of the hybrid requires appropriate selection of  $G_{m1}$  and  $G_{m2}$ . Therefore, solving for  $v_y$  in terms of  $v_x$  in Fig. 10 gives

$$v_y = v_x \left( \frac{Z_0}{Z_0 + R} \right). \quad (17)$$

Substituting (17) into (16) and setting  $v_{ne,ech} = 0$  gives

$$G_{m1} = -G_{m2} \frac{Z_0}{Z_0 + R} \quad (18)$$

as the condition for perfect cancellation of the near-end transmitter signal at the output of the  $R-g_m$  hybrid. It can be shown that the recovered far-end signal at the output of the hybrid,  $v_{fe,rec}$ , is given by

$$v_{fe,rec} = \left( v_y G_{m2} \frac{R}{Z_0} \right) r. \quad (19)$$

One advantage of the split-termination hybrid is continuous-time signal recovery which removes timing mismatch issues, unlike the works in [43] and [51]. This also means that the inbound and outbound signals can operate at different frequencies. Another advantage of this work is that it consumes less power and area than the replica driver

approach in [43], since it does not require, one, a (scaled) replica driver and, two, additional clocking circuitry to match the timing between the replica and output driver [41]. Finally, since the driver is isolated from the pad, then it can operate at higher frequencies because it does not see the parasitic capacitance of the pad, unlike the works in [43] and [51].

However, this work has its own drawbacks, such as a residual error signal when there is a mismatch between the two  $g_m$ -circuits in the hybrid. Second, matching the ratio of the  $g_m$ -circuits across a broad frequency range and PVT is particularly challenging. Third, any variation in the current-sensing resistor,  $R$ , can also result in a residual error signal. Fourth, the  $g_m$ -circuits need to be linear to avoid any distortion, which limits the differential swing at the input to the hybrid. Fifth, this circuit requires differential inputs to avoid any delay mismatch with single-ended inversion, thus increasing the area of the transceiver. Sixth, the split-termination hybrid attenuates the far-end desired signal via the passive termination, thus decreasing the amplitude of the recovered signal at the hybrid output. Finally, there is a tradeoff between the amplitude of the recovered far-end signal at the hybrid output,  $v_{fe,rec}$ , and the efficiency of the current-mode transmitter in Fig. 10, since increasing  $R$  will increase  $v_{fe,rec}$  but decrease  $i_t$  and thus the amplitude of the outbound signal.

Compared to the replica driver and dynamic reference switching hybrid architectures, the split-termination hybrid is a relatively new approach for removing the near-end TX interference signal and has only been used in three other works [32], [33], and [56]. Nonetheless, this technique is a promising alternative to the replica driver hybrid for +10-Gb/s speeds as demonstrated in [32] and [56].

So far, we have surveyed hybrid circuits applied to SBD signaling between separately packaged chips. Next, we will discuss more recent SBD transceiver works specifically for D2D links, including their respective measurement results.

#### D. DIRECTIONAL-INVERTER-BUFFER- $g_m$ HYBRID

To remove the near-end transmitter signal, Wary and Mandal [33] implemented a variation of the split-termination method used in [41], with the passive current-sensing resistor  $R$  in Fig. 10 replaced by an active termination, as shown in Fig. 11.

As shown in Fig. 11, the directional-inverter-buffer (DIB)- $g_m$  hybrid operates by inverting the near-end interference signal,  $v_{ne,int}$ , such that the signal at node B is attenuated by  $\alpha$  and is 180° out-of-phase with respect to  $v_{ne,int}$  at node A. The signals at nodes A and B then pass through the  $g_m$ -circuits  $G_{m1}$  and  $G_{m2}$ , respectively, where they are weighted accordingly and then summed together via the  $g_m$ -adder to remove  $v_{ne,int}$  at the output of the hybrid. On the other hand, the desired far-end signal at node B,  $v_{fe,des}$ , is amplified by  $\beta$  after it passes through the DIB to node A, thus generating the signal  $\beta v_{fe,des}$  at node A. These two signals are then added after passing through their respective  $g_m$ -circuits, resulting in an amplified version of the desired far-end signal, as shown



**FIGURE 11.** DIB- $g_m$  hybrid circuit used in [33]. Top: Schematic of the DIB circuit. Bottom: Schematic of the  $g_m$ -adder. Note that a single-ended version of the DIB- $g_m$  hybrid is shown for simplicity, whereas the actual circuit is fully differential.

in Fig. 11. It can be shown that the signal at the output of the hybrid,  $v_{hyb}$ , is given by

$$v_{hyb} = [(G_{m1}\beta + G_{m2})v_{fe,des} + (G_{m1} - G_{m2}\alpha)v_{ne,int}]R_{rx}. \quad (20)$$

Therefore, removing  $v_{ne,int}$  requires

$$\alpha = \frac{G_{m1}}{G_{m2}} \quad (21)$$

from which (20) becomes

$$v_{hyb} = G_{m1}R_{rx}(\beta + \alpha^{-1})v_{fe,des} \quad (22)$$

where  $0 < \alpha < 1$  and  $\beta > 1$  are the gains from node A to node B and node B to node A, respectively. Thus, the DIB provides three functions in one circuit.

- 1) Termination of the link in the characteristic impedance of the channel,  $Z_0$ .
- 2) Inversion of the near-end interference signal for easy cancellation via weighted-addition.
- 3) Amplification of the desired far-end signal for easier signal recovery.

An important advantage of the DIB- $g_m$  hybrid compared to the  $R - g_m$  hybrid is that it has a high impedance at node A, which helps to amplify the desired far-end current signal. Second, the DIB- $g_m$  hybrid provides a low impedance at node B for terminating the channel via an active circuit, as opposed to a large passive resistor with significant parasitic capacitances as used in the  $R - g_m$  hybrid, which helps to extend the bandwidth of the transceiver. Finally, the transconductors used in the DIB- $g_m$  hybrid,  $G_{m1}$ , and  $G_{m2}$ , are more power efficient than the  $R - g_m$  hybrid transconductors. That is, if  $G_{m1}$  and  $G_{m2}$  in the DIB- $g_m$  are the same as  $G_{m1}$  and  $G_{m2}$  in the  $R - g_m$  hybrid, then it can be shown that the power ratio of the transconductor circuits of the  $R - g_m$  hybrid to the DIB- $g_m$  hybrid is given by [33]

$$\frac{P_{R-g_m}}{P_{\text{DIB}-g_m}} = \left( \frac{1 + \left( \frac{Z_0 + R}{Z_0} \right)^2}{1 + \alpha^{-2}} \right) \left[ \frac{(\beta + \alpha^{-1})Z_0}{2R} \right]^2. \quad (23)$$

Since  $0 < \alpha < 1$ ,  $\beta > 1$ , and  $R < Z_0$ , then the above ratio yields  $P_{R-g_m} > P_{\text{DIB}-g_m}$ , which implies that the DIB- $g_m$  hybrid is more power efficient than the  $R - g_m$  hybrid.

Unfortunately, just as in the case of the  $R - g_m$  hybrid, the DIB- $g_m$  hybrid suffers from signal-dependent nonlinearity issues from the use of the  $g_m$ -circuits, which limits the signal swing and hence the vertical eye opening at the hybrid output. Indeed, since the DIB- $g_m$  hybrid uses additional active circuits (the DIB for termination of the link and the  $g_m$ -adder for the receiver), then it experiences more signal-dependent nonlinearity challenges than its predecessor in the  $R - g_m$  hybrid. Second, the speed of the circuit is limited by the cross-coupled nMOS-pair  $M_1$  and  $M'_1$  that is used to generate a negative impedance at node A for inverting the signal in going from node A to node B. Finally, just as in the  $R - g_m$  hybrid circuit in [41], removal of the near-end interference (i.e., the output of the near-end TX) requires matching the ratio of  $G_{m1}$ -to- $G_{m2}$  across PVT, which is very challenging at high frequencies.

The prototype implementation in [33] was designed in 180-nm CMOS at a full-duplex data rate of 4-Gb/s over a differential 5-mm link, with an area of  $1275-\mu\text{m}^2$  and a power efficiency of 0.95-pJ/bit from a 1.8-V supply.

#### E. RESISTOR-BRIDGE REPLICA DRIVER HYBRID FOR D2D LINKS

Yuan et al. [28] implemented a resistor-bridge replica driver hybrid circuit for removal of the near-end transmitter signal, as shown in Fig. 12.

As shown in Fig. 12, the hybrid can be classified as a variation of the replica driver approach, where the predriver mimics the replica driver. To cancel the near-end transmitter output, a variation of a Wheatstone bridge (resistor-bridge) was designed by splitting the feedback resistor of the transimpedance amplifier (TIA) in half, so that the node connecting the two resistors, node B in Fig. 12, was used as the receiving node in the hybrid. As a result, the output of the predriver,  $v_{ne,int}$ , would go through both the left half of



FIGURE 12. Resistor-bridge replica driver hybrid used in [28].

the bridge, via node A, and the right half of the bridge, via node C, where it would be inverted to give  $-v_{ne,int}$ . These two signals are then summed at node B so that they (ideally) cancel each other out.

One of the benefits of this work is that the feedback resistor in the predriver helps to improve the slew rate of the input signal to the TIA driver, thus helping to improve the maximum data rate of the link [28]. Furthermore, reusing the predriver as the replica driver saves power and area compared to the replica driver work in [43]. However, use of the resistor bridge for near-end echo removal causes attenuation of the far-end (desired) signal due to the load of the predriver. That is, the output impedance of the predriver impacts the signal integrity of the received signal. Another challenge with this work is that the gain and timing mismatch between the predriver and output driver at high frequency causes some high-frequency content of the near-end transmit signal to appear at the output of the hybrid.

The prototype design in [28] was implemented in 65-nm CMOS at a full-duplex data rate of 15-Gb/s over a differential 10-mm link on PCB, with an area of  $150 \times 80-\mu\text{m}^2$  and power consumption of 10.125-mW from a 1.2-V supply. A resistive-feedback, inverter-based TIA was used as the output driver.

#### F. REPLICA DRIVER HYBRID FOR D2D LINKS

Nishi et al. [18] removed the near-end transmitter signal via an inverter-based replica driver and a TIA that adds the voltages  $v_{rep}$  and  $v_{ne,int}$  together, as shown in Fig. 13. Assuming that  $R_{h1} \gg R_{SST}$ , the output of the TIA due to the near-end interference signal,  $v_{ne,ech}$ , is (approximately) given by

$$v_{ne,ech} \approx -\left( \frac{R_F}{R_{h2}} \right) v_{rep} - \left( \frac{R_F}{R_{h1}} \right) \left( \frac{Z_0}{R_{SST} + Z_0} \right) v_{ne,int}. \quad (24)$$

With  $v_{rep} = -v_{ne,int}$ , (24) becomes

$$v_{ne,ech} \approx R_F \left[ \frac{R_{h1}(R_{SST} + Z_0) - R_{h2}Z_0}{(R_{SST} + Z_0)R_{h1}R_{h2}} \right] v_{ne,int}. \quad (25)$$

Therefore, the near-end transmitter signal is removed at the hybrid output if  $R_{h2} \approx (R_{SST}/Z_0 + 1)R_{h1}$ . To generate  $v_{rep} = -v_{ne,int}$ , the output voltage of the main driver, TX in Fig. 13, drives  $V_S = \pm(V_{DD}/2)$ , into the line



**FIGURE 13.** Replica driver hybrid transceiver used in [18].

whereas the replica driver drives  $-V_S$  into the line such that the output currents of the main and replica drivers cancel. Note that the threshold voltage of the inverters is used as ground [18].

The key advantage of this design is its simplicity, making it compact, low power, and scalable. It uses CMOS inverters for the main and replica drivers and in the receiver, thus making it easy to scale with technology. Furthermore, the use of CMOS inverters allows for low (dynamic-only) power consumption and small area. The area of the full transceiver was bump-limited by the micro-bump pitch of 55- $\mu\text{m}$  in [18].

There are a few challenges with the hybrid in this work. The assumption  $R_{h1} \gg R_{SST}$  is imperfect, so the hybrid circuit impacts the SBD link's termination. Furthermore, removal of the outbound signal (near-end transmitter output) requires precise matching of resistors  $R_{h1}$  and  $R_{h2}$ , which may be difficult to achieve across PVT.

The prototype implementation in [18] was designed in 5-nm CMOS for a 1.2-mm D2D link at a full-duplex data rate of 50.4-Gb/s, with an area of 0.0045-mm<sup>2</sup> at a power consumption of 7.5-mW from 0.75-V supply per-lane (per 1 TX and 1 RX).

To better compare each hybrid architecture in terms of scalability, speed, and power, plots of speed versus technology and power versus speed for the three hybrid architectures considered in this article were generated in Fig. 14. Fig. 14(a) shows that the replica driver hybrid scales best with technology while also achieving the highest speed, followed by the split-termination and the dynamic reference switching hybrids. As discussed previously, the dynamic reference switching hybrid is limited in terms of speed because of the challenge associated with designing a high-speed MUX that can work at +10-Gb/s speeds. Referring to Fig. 14(b), both the replica driver and split-termination hybrids offer the lowest power consumption while simultaneously operating at +10-Gb/s data rates. However, the replica driver hybrid architecture is the only one of the three architectures considered that has been demonstrated to operate at +100-Gb/s full-duplex data rates.

It should be mentioned that the hybrid architectures that were discussed in this section are not the only approach for recovering the far-end desired signal in SBD signaling.



Technology (nm)

(a)



(b)

**FIGURE 14.** (a) Speed versus technology for three hybrid architectures: dynamic reference switching (blue), replica driver (orange), and split-termination (red). (b) Power versus speed for three hybrid architectures: dynamic reference switching (blue), replica driver (orange), and split-termination (red).

Indeed, the works in [21] and [44] used a correlation-based approach for removing the near-end interference signal and recovering the far-end desired signal.

#### IV. SPLIT-TERMINATION, PASSIVE HYBRID

This section provides a more detailed description of a single-ended, split-termination passive hybrid SBD transceiver design for D2D links [56]. Starting with the design of the passive hybrid, and followed by key post-layout simulation results to demonstrate the efficacy of the transceiver. Finally, post-layout simulations demonstrate the impact of timing in SBD links, as discussed in Section II.

The SBD transceiver is a half-rate design with a full-duplex data rate of 32 Gb/s (16 Gb/s + 16 Gb/s) and was implemented in 16-nm FinFET CMOS technology for a 5-mm D2D link in an organic substrate, while consuming 6.7-mW (excluding the clock buffers) for a single transceiver from a 0.9-V supply. A block diagram of the complete transceiver is shown in Fig. 15(a), with a zoomed-in view of the passive hybrid, including the nominal values for its resistors and capacitors, in Fig. 15(b). Note that the input



FIGURE 15. (a) Block diagram of the SBD transceiver implemented in [56]. (b) Schematic of the passive hybrid with nominal values for the resistors and capacitors.

capacitance of the independent, half-rate slicers is  $<10\text{-fF}$ , so its impact on the capacitance of the hybrid is negligible.

Our work used a combination of the approaches in [28] and [41], to combine the advantages of each approach. In particular, an inverter-based TIA voltage-mode driver was used as the output driver for reducing signal reflections, as in [28], whereas the hybrid was implemented as a split-termination passive hybrid, similar to the work in [41], that was composed of an  $RC$  split-path filter (passive CTLE). The advantage of the passive hybrid compared to the  $R - g_m$  is that, one, it minimizes power consumption of the transceiver and, two, it improves matching across PVT. A drawback compared to the  $R - g_m$  hybrid is the increased area. Following the hybrid are two independent half-rate, double-tail latched comparators. Mesosynchronous clocking was used so as to correlate the jitter between the data and RX clock [56].

As discussed in Section II, far-end signal reflections introduce significant signal integrity challenges in SBD signaling for D2D links, so minimizing signal reflections is very important because these signal reflections add together with the near-end echo signal at the hybrid output. Including the small attenuation of the channel, the amplitude of these signal reflections can be large enough to cause errors. Using an inverter-based TIA driver helps to reduce these signal reflections due to impedance discontinuities by making the output impedance of the driver independent of signal transitions of the outbound data, as shown by the following equation:

$$Z_0 \approx R_{sp} + \frac{1}{2g_m}. \quad (26)$$

The split-termination passive hybrid is similar to the split-termination  $R - g_m$  hybrid, except that the  $G_m$ -circuits are replaced with two passive CTLEs  $H_{13}(s)$  and  $H_{24}(s)$ , as shown in Fig. 15(b), such that the node voltages  $v_{H_3}$  and  $v_{H_4}$  at the output of the hybrid depend only on the ratio of passive components. This helps to improve matching across PVT compared to [41]. Excluding  $R_{DC}$  and  $R_{sp}$ , all resistors in the hybrid are made programmable using 5-bit thermometer encoding. The resistors  $R_{1,1}$  and  $R_{1,2}$  are made programmable from  $1\text{-k}\Omega$  to  $2\text{-k}\Omega$  by connecting  $8\text{-k}\Omega$  resistors in parallel, whereas  $R_{2,1}$  and  $R_{2,2}$  are tunable from  $1.67\text{-k}\Omega$  to  $3.33\text{-k}\Omega$ , by connecting  $13.33\text{k-}\Omega$  resistors in parallel, and  $7\text{-k}\Omega$  to  $14\text{-k}\Omega$  via parallel connections of  $56\text{-k}\Omega$  resistors, respectively. Resistor  $R_{DC}$  is programmable from  $62.5\text{-k}\Omega$  to  $500\text{-k}\Omega$  via an 8-bit thermometer encoding that connects  $500\text{-k}\Omega$  resistors in parallel. Finally, capacitors  $C_{2,1}$  and  $C_{2,2}$  are tunable from  $50\text{-fF}$  to  $150\text{-fF}$  in  $25\text{-fF}$  steps using 5-bit thermometer encoding, whereas capacitors  $C_{1,1}$  and  $C_{1,2}$  are programmable from  $83.33\text{-fF}$  to  $250\text{-fF}$  and  $350\text{-fF}$  to  $1.05\text{-pF}$  in  $41.67\text{-fF}$  and  $175\text{-fF}$  steps, respectively, with 5-bit thermometer encoding.

In [56], we showed that to remove the near-end transmitter signal, the relationship between the attenuation ratios of shunt-arm 1 and shunt-arm 2,  $A_1 (= A_{R_1} = A_{C_1})$  and  $A_2 (= A_{R_2} = A_{C_2})$ , respectively, was given by

$$A_1 = A_2 \left( 1 + \frac{R_{sp}}{Z_0} \right)^{-1}. \quad (27)$$

Equation (27) is a specific instance of (5) (presented in Section I) for this hybrid, where  $A_1$  represents the approximation of  $H_{13}(s)$ ,  $A_2$  the approximation of  $H_{24}(s)$ , and  $(1 + [R_{sp}/Z_0])^{-1}$  is the approximation of  $H_{12}(s)$ . This shows how (5) can be used to synthesize different hybrids.

The TIA driver was compared to a conventional P-over-N driver to investigate the impact of signal reflections at the output of the passive hybrid. Fig. 16(a) shows that the output impedance of the TIA driver stays (approximately) constant at  $30\text{-}\Omega$  as the input transitions from LO to HI, whereas the output impedance of the P-over-N driver, Fig. 16(b), varies significantly during an input signal transition. A consequence of the large variation in output impedance of the P-over-N



**FIGURE 16.** Driver output impedance for (a) TIA driver and (b) P-over-N driver during data transition. (c) Near-end echo signal for TIA driver (orange) and P-over-N driver (blue).



**FIGURE 17.** Eye diagrams at 32-Gb/s (16-Gb/s + 16-Gb/s) after 32 000-UI at a PRBS-7 input pattern at the output of the hybrids on (a) die A and (b) die B. Note that the sources on die A and B were mesosynchronous for this simulation.

driver is a larger signal reflection that corrupts the near-end echo signal at the output of the hybrid, as shown in Fig. 16(c) where the peak-to-peak swing of the near-end echo signal is 33-mV. In comparison, the TIA driver has a significantly smaller signal reflection during signal transitions and does not cause as large a variation in the near-end echo signal, as shown in Fig. 16(c) where the peak-to-peak swing in the near-end echo signal is 24-mV (a 27% reduction in the peak-to-peak swing of the echo signal), thus confirming the improvement in signal integrity by reducing signal reflections via the TIA driver.

Using a PRBS-7 input data pattern with power supply noise included, the eye diagrams after 32 000-UI at the hybrid outputs on die A and B are shown in Fig. 17(a) and (b), respectively, where the eye width of each eye is 0.7-UI and the eye height on die A (blue) is 47-mV whereas the eye height on die B (red) is 42.4-mV. The full-duplex data rate is 32-Gb/s (16-Gb/s + 16-Gb/s) and both TRXs are synchronous. A power breakdown of a single TRX is shown in Fig. 18.



**FIGURE 18.** Power breakdown of a single TRX for the split-termination passive hybrid in [56]. Simulated at the TT, 80°C, 0.9-V corner.



**FIGURE 19.** Sending 1-UI pulses simultaneously from each die while sweeping the delay of the clock on die A with respect to the clock on die B to get the peak of the received signal at the hybrid outputs on die A (blue) and B (orange) as a function of phase offset.

Finally, the impact of time delay between the clocks on die A and B on the received,  $v_{hyb}$ , (desired + near-end echo) waveform at the hybrid outputs on each die was simulated by simultaneously sending a single 1-UI pulse from the transmitters on die A and die B while sweeping the delay of the clock on die A with respect to the clock on die B. The results in Fig. 19 show that the peak of the signal at the hybrid output on each die vary in opposite directions to each other, thus confirming the analysis on timing issues in SBD signaling presented in Section II. Therefore, the results in Fig. 19 demonstrate that the signal integrity on each die is affected by the relative clock phases of the two transmitters and the length of the channel, and this variation cannot be corrected even by adjusting the RX clock phases on each die independently and arbitrarily.

## V. CONCLUSION

This article reviewed the trend toward heterogeneous integration, including the importance of D2D links to connect chiplets together within integrated systems-in-package. SBD signaling is an alternative to UD signaling that increases the bandwidth of D2D links. SBD signaling is a promising

signaling method for D2D links because of its  $2\times$  better pin efficiency than UD signaling. However, it has its own unique signaling challenges, such as the near-end echo signal from the local TX, far-end echo signals due to imperfect channel termination, and timing challenges due to channel delay and relative phase offset between the TX clocks on opposite ends of the link. SBD signal integrity challenges were discussed, along with previous SBD architectures' approaches to canceling the near-end transmitter output signal from the receiver. These included dynamic reference-switching, the replica driver, and the split-termination hybrid. This was followed by a review of recent SBD transceivers for D2D links, including a detailed review of a split-termination, passive hybrid as a high-speed, low-power alternative to the works in [18], [28], and [33] for SBD signaling in D2D links. The analysis presented in this article may serve to highlight and understand the signal integrity tradeoffs in SBD transceiver designs for D2D links, which one may expect to proliferate as bandwidth demands increase faster than the number of available connections to each die.

## REFERENCES

- [1] R. Viswanath, A. Chandrasekhar, S. Srinivasan, Z. Qian, and R. Mahajan, "Heterogeneous SoC integration with EMIB," in *Proc. IEEE Electr. Design Adv. Packag. Syst. Symp. (EDAPS)*, Dec. 2018, pp. 1–3.
- [2] T. Li, J. Hou, J. Yan, R. Liu, H. Yang, and Z. Sun, "Chiplet heterogeneous integration technology—Status and challenges," *Electronics*, vol. 9, no. 4, p. 670, Apr. 2020.
- [3] *Chiplets: Designing, Manufacturing, and Testing*, Semicond. Eng., Silicon Valley, CA, USA, 2023.
- [4] P. Gargini, "Roadmap evolution: From NTRS to ITRS, from ITRS 2.0 to IRDS," in *Proc. 5th Berkeley Symp. Energy Effic. Electron. Syst. Steep Transistors Workshop (E3S)*, 2017, pp. 1–62.
- [5] *Samsung's June 2023 Reveal: Enhanced 3nm & 4nm Chip Fabrication Process*, Electropages, Poole, U.K., Accessed: Mar. 25, 2024.
- [6] D. Das Sharma, *The UCIeTM 1.1 Specification: Future Applications of Chiplets*. Accessed: Mar. 24, 2024. [Online Video]. Available: <https://www.youtube.com/watch?v=ZcqVUo5cWDU>
- [7] R. J. Hahn and P. P. Conway, "Multichip modules (MCMs): A review of the status quo," *J. Electron. Manuf.*, vol. 3, no. 1, pp. 1–11, 1993.
- [8] I. Cutress, "Computex 2017: AMD press event live blog." Accessed: Mar. 25, 2024. [Online]. Available: <https://www.anandtech.com/show/11476/computex-2017-amd-press-event-live-blog-starts-10pm-et>
- [9] "Kandou licenses glasswing SerDes technology to Marvell." Design And Reuse. Accessed: Mar. 25, 2024. [Online]. Available: <https://www.design-reuse.com/news/39513/kandou-serdes-marvell.html>
- [10] *Universal Chiplet Interconnect Express (UCIe) Specification 1.1*, UCIe Consortium, Beaverton, OR, USA, Jul. 2023.
- [11] *Chiplets: Designing, Manufacturing, and Testing*, Semicond. Eng., Silicon Valley, CA, USA, 2023.
- [12] W. T. Beyene, "Chiplet technology and heterogeneous integration," *IEEE Electr. Packag. Soc. Newslett.*, to be published.
- [13] *Cisco Annual Internet Report (2018–2023)*, Cisco, San Jose, CA, USA, White Paper, 2020. Accessed: Mar. 25, 2024.
- [14] A. V. Krishnamoorthy, "Optical interconnects in computing and switching systems: The anatomy of a 20Tbps switch card," in *Proc. ISSCC*, 2018, pp. 1–22.
- [15] B. Dehlaghi, N. Wary, and T. C. Carusone, "Ultra-short-reach interconnects for die-to-die links: Global bandwidth demands in microcosm," *IEEE Solid-State Circuits Magazine*, vol. 11, no. 2, pp. 42–53, Jun. 2019.
- [16] W. Beyene, N. Juneja, Y.-C. Hahm, R. Kollipara, and J. Kim, "Signal and power integrity analysis of high-speed links with silicon interposer," in *Proc. IEEE 67th Electron. Compon. Technol. Conf. (ECTC)*, May 2017, pp. 1708–1715.
- [17] G. A. Parulekar, S. Goyal, N. Ajith, I. Mishra, and S. Gupta, "A linear ratioed impedance and driver-based true full-duplex IO with background self-interference cancellation," in *Proc. IEEE 66th Int. Midwest Symp. Circuits Syst. (MWSCAS)*, Aug. 2023, pp. 689–693.
- [18] Y. Nishi et al., "A 0.297-pJ/Bit 50.4-Gb/s/wire inverter-based short-reach simultaneous bi-directional transceiver for die-to-die interface in 5-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 58, no. 4, pp. 1062–1073, Apr. 2023.
- [19] Y. Lee, M. Shim, S. Roh, W. Lee, and D.-K. Jeong, "An 80-Gb/s PAM-4 simultaneous bidirectional transceiver with hybrid adaptation scheme," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 70, no. 8, pp. 2884–2888, Aug. 2023.
- [20] Y. Lee, W. Lee, M. Shim, S. Shin, W.-S. Choi, and D.-K. Jeong, "0.41-pJ/b/dB asymmetric simultaneous bidirectional transceivers with PAM-4 forward and PAM-2 back channels for 5-m automotive camera link," in *Proc. IEEE Symp. VLSI Technol. Circuits*, Jun. 2022, pp. 30–31.
- [21] S. Goyal, G. Parulekar, and S. Gupta, "A true full-duplex IO (TFD-IO) with background SI cancellation for high-density interfaces," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 30, no. 5, pp. 615–624, May 2022.
- [22] P. K. Govindaswamy, N. Wary, and V. S. R. Pasupureddi, "Power efficient echo-cancellation based hybrid for full-duplex chip-to-chip interconnects," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2022, pp. 852–856.
- [23] P. K. Govindaswamy and V. S. R. Pasupureddi, "A power-efficient current-integrating hybrid for full-duplex communication over chip-to-chip interconnects," *Int. J. Circuit Theory Appl.*, vol. 50, no. 12, pp. 4219–4233, 2022.
- [24] A. Manian, A. Rane, Y. Koh, H. K. Nat, and M. Lu, "A simultaneous bidirectional single-ended coaxial link with 24-Gb/s forward and 312.5-Mb/s back channels," *IEEE J. Solid-State Circuits*, vol. 56, no. 3, pp. 972–987, Mar. 2021.
- [25] T. Kishishita and H. Krüger, "Prototype of simultaneous bidirectional data-transmitter in 65 nm CMOS," *J. Inst.*, vol. 16, no. 6, Jun. 2021, Art. no. T06002.
- [26] R. Farjadrad et al., "11.8 an echo-cancelling front-end for 112Gb/s PAM-4 simultaneous bidirectional signaling in 14nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2021, pp. 194–196.
- [27] A. Ebrahimi Jarihani, J. Sturm, and A. M. Tonello, "A full-duplex transceiver for 20-Gbps high-speed simultaneous bidirectional signaling across global on-chip interconnections," *Int. J. Circuit Theory Appl.*, vol. 49, no. 10, pp. 3455–3465, 2021.
- [28] C. Yuan, A. Naguib, and S. Shekhar, "On the design of low-power hybrids for full duplex simultaneous bidirectional signaling links," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 4, pp. 1413–1422, Apr. 2020.
- [29] P. Venuturupalli, P. K. Govindaswamy, and V. S. R. Pasupureddi, "Residue monitor enabled charge-mode adaptive echo-cancellation for simultaneous bidirectional signaling over on-chip interconnects," *Microelectron. J.*, vol. 104, Oct. 2020, Art. no. 104899.
- [30] R. Farjadrad, "Simultaneous bidirectional serial link interface with optimized hybrid circuit," U.S. Patent 10552353 B1, Feb. 2020. Accessed: May 31, 2022. [Online]. Available: <https://patents.google.com/patent/US10552353B1/en>
- [31] A. Ebrahimi Jarihani, S. Sarafi, M. Koeberle, J. Sturm, and A. M. Tonello, "A 16 Gbps, full-duplex transceiver over lossy on-chip interconnects in 28 nm CMOS technology," *Electronics*, vol. 9, no. 5, p. 717, May 2020.
- [32] Y.-H. Fan et al., "A 32 Gb/s simultaneous bidirectional source-synchronous transceiver with adaptive echo cancellation in 28nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Apr. 2019, pp. 1–4.
- [33] N. Wary and P. Mandal, "Current-mode full-duplex transceiver for lossy on-chip global interconnects," *IEEE J. Solid-State Circuits*, vol. 52, no. 8, pp. 2026–2037, Aug. 2017.
- [34] I. A. Ukaegbu et al., "Design of full-duplex and multifunction bidirectional CMOS transceiver for optical interconnect applications," *Opt. Quant. Electron.*, vol. 49, no. 7, pp. 7–14, Jun. 2017.

- [35] M. Jalalifar and G.-S. Byun, "An energy-efficient mobile memory I/O interface using simultaneous bidirectional multilevel dual-band signaling," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 64, no. 8, pp. 897–901, Aug. 2017.
- [36] D. Duvvuri, S. Agarwal, and V. S. R. Pasupureddi, "A new hybrid circuit topology for simultaneous bidirectional signaling over on-chip interconnects," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2016, pp. 2342–2345.
- [37] M. T. L. Aung, E. Lim, T. Yoshikawa, and T. T.-H. Kim, "A 3-Gb/s/ch simultaneous bidirectional capacitive coupling transceiver for 3DICs," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 61, no. 9, pp. 706–710, Sep. 2014.
- [38] H.-Y. Huang and R.-I. Pu, "Differential bidirectional transceiver for on-chip long wires," *Microelectron. J.*, vol. 42, no. 11, pp. 1208–1215, Nov. 2011.
- [39] M. Bichan, M. Hossain, and A. Chan Carusone, "Frequency-division bidirectional communication over chip-to-chip channels," *IEEE Trans. Adv. Packag.*, vol. 32, no. 2, pp. 298–305, May 2009.
- [40] C. J. Akl and M. A. Bayoumi, "Wiring-area efficient simultaneous bidirectional point-to-point link for inter-block on-chip signaling," in *Proc. 21st Int. Conf. VLSI Design (VLSID)*, Jan. 2008, pp. 195–200.
- [41] Y. Tomita, H. Tamura, M. Kibune, J. Ogawa, K. Gotoh, and T. Kuroda, "A 20-Gb/s simultaneous bidirectional transceiver using a resistor-transconductor hybrid in 0.11- $\mu$ m CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 3, pp. 627–636, Mar. 2007.
- [42] J.-H. Kim et al., "A 4-Gb/s/pin low-power memory I/O interface using 4-level simultaneous bi-directional signaling," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 89–101, Jan. 2005.
- [43] R. Mooney, C. Dike, and S. Borkar, "A 900 Mb/s bidirectional signaling scheme," *IEEE J. Solid-State Circuits*, vol. 30, no. 12, pp. 1538–1543, Dec. 1995.
- [44] V. Srinivas, K. J. Bois, D. Knee, D. Quint, M. F. Chang, and I. Verbauwheide, "Gigabit simultaneous bi-directional signaling using DS-CDMA," in *Proc. IEEE 11th Topical Meeting Electr. Perform. Electron. Packag.*, Oct. 2002, pp. 15–18.
- [45] H. Wilson and M. Haycock, "A six-port 30-GB/s nonblocking router component using point-to-point simultaneous bidirectional signaling for high-bandwidth interconnects," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1954–1963, Dec. 2001.
- [46] R. J. Drost, "Architecture and design of a simultaneously bidirectional single-ended high-speed chip-to-chip interface," Ph.D. dissertation, Dept. Electr. Eng., Stanford Univ., Stanford, CA, USA, 2001. Accessed: May 31, 2022.
- [47] R. J. Drost and B. A. Wooley, "An 8-Gb/s/pin simultaneously bidirectional transceiver in 0.35-/spl mu/m CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 11, pp. 1894–1908, Nov. 2004, doi: [10.1109/JSSC.2004.835837](https://doi.org/10.1109/JSSC.2004.835837).
- [48] E. Yeung and M. A. Horowitz, "A 2.4 Gb/s/pin simultaneous bidirectional parallel link with per-pin skew compensation," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1619–1628, Nov. 2000.
- [49] S. A. Jackson and B. J. Blalock, "A CMOS mixed signal simultaneous bidirectional signaling I/O," in *Proc. Midwest Symp. Circuits Syst.*, Aug. 1998, pp. 37–40.
- [50] T. Takahashi, M. Uchida, T. Takahashi, R. Yoshino, M. Yamamoto, and N. Kitamura, "A CMOS gate array with 600 Mb/s simultaneous bidirectional I/O circuits," *IEEE J. Solid-State Circuits*, vol. 30, no. 12, pp. 1544–1546, Dec. 1995.
- [51] B. Casper, A. Martin, J. E. Jaussi, J. Kennedy, and R. Mooney, "An 8-Gb/s simultaneous bidirectional link with on-die waveform capture," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2111–2120, Dec. 2003.
- [52] K. Lee, S. Kim, G. Ahn, and D.-K. Jeong, "A CMOS serial link for fully duplexed data communication," *IEEE J. Solid-State Circuits*, vol. 30, no. 4, pp. 353–364, Apr. 1995.
- [53] L. Dennison, W. S. Lee, and W. J. Dally, "High-performance bidirectional signalling in VLSI systems," in *Proc. Symp. Res. Integr. Syst.*, 1993, pp. 300–319.
- [54] G. Y. Yacoub and W. H. Ku, "Self-timed simultaneous bidirectional signalling for IC systems," in *Proc. IEEE Int. Symp. Circuits Syst.*, vol. 6, May 1992, pp. 2957–2960.
- [55] K. Lam, L. R. Dennison, and W. J. Dally, "Simultaneous bidirectional signalling for IC systems," in *Proc. IEEE Int. Conf. Computer Design, VLSI Comput. Process.*, Sep. 1990, pp. 430–433.
- [56] D. Jarrett-Amor, K. Yadav, D. Zhang, B. Yang, S. Jalali, and T. C. Carusone, "A 32 Gb/s, 0.42 pJ/bit passive hybrid simultaneous bidirectional transceiver for die-to-die links," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2023, pp. 1–5.
- [57] B. K. Casper, M. Haycock, and R. Mooney, "An accurate and efficient analysis method for multi-Gb/s chip-to-chip signaling schemes," in *Symp. VLSI Circuits Tech. Dig.*, Jun. 2002, pp. 54–57, doi: [10.1109/VLSIC.2002.1015043](https://doi.org/10.1109/VLSIC.2002.1015043).
- [58] S. Palermo, "High-speed serial I/O design for channel-limited and power-constrained systems," in *CMOS Nanoelectronics Analog and RF VLSI Circuits*. New York, NY, USA: McGraw-Hill, 2011, ch. 9.
- [59] S. Palermo, "MATLAB source code from 'ECEN 720: High-speed links circuits and systems'." 2023. Accessed: Jul. 2024. [Online]. Available: <https://people.engr.tamu.edu/spalermo/ecen720.html>
- [60] J. F. Buckwalter, "Signal integrity in reflection-limited channels," in *Proc. IEEE MTT-S Int. Microw. Symp. Dig.*, Atlanta, GA, USA, Jun. 2008, pp. 1565–1568, doi: [10.1109/MWSYM.2008.4633081](https://doi.org/10.1109/MWSYM.2008.4633081).
- [61] D. Jarrett-Amor and T. C. Carusone, "A comparison between single-ended, NRZ unidirectional signaling and single-ended, NRZ simultaneous-bidirectional signaling for die-to-die links," *IEEE Micro*, vol. 45, no. 1, pp. 48–56, Jan./Feb. 2025, doi: [10.1109/MM.2024.3436008](https://doi.org/10.1109/MM.2024.3436008).



**DURAND JARRETT-AMOR** (Member, IEEE) received the M.A.Sc. degree in electrical engineering from Toronto Metropolitan University, Toronto, ON, Canada, in 2017. He is currently pursuing the Ph.D. degree with the Integrated Systems Laboratory, University of Toronto, Toronto, where he is researching pin- and power-efficient, high-speed signaling over die-to-die links.

His research interests include analog- and mixed-signal circuit design for high-speed wireline communication.



**TONY CHAN CARUSONE** (Fellow, IEEE) received the Ph.D. degree from the University of Toronto, Toronto, ON, Canada, in 2002.

Since 2002, he has been a Professor with the Department of Electrical and Computer Engineering, University of Toronto. He has also been a consultant to industry in the areas of integrated circuit design and digital communication since 1997. He is currently the Chief Technology Officer of Alphawave Semi, Toronto. He co-authored Best Student Papers at the 2007, 2008, 2011, and 2022 Custom Integrated Circuits Conferences, the Best Invited Paper at the 2010 Custom Integrated Circuits Conference, the Best Paper at the 2005 Compound Semiconductor Integrated Circuits Symposium, the Best Young Scientist Paper at the 2014 European Solid-State Circuits Conference, and Best Papers at DesignCon 2021 and 2023.

Prof. Carusone has been a Distinguished Lecturer for the IEEE Solid-State Circuits Society 2015–2017 and 2025 onward, and has served on the Technical Program Committee of several IEEE conferences, including the International Solid-State Circuits Conference 2016–2021. He was an Editor-in-Chief of the *IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS* in 2009, an Associate Editor for the *IEEE JOURNAL OF SOLID-STATE CIRCUITS* 2010–2017, and an Editor-in-Chief of the *IEEE SOLID-STATE CIRCUITS LETTERS* 2021–2023. He co-authored the popular textbooks *Analog Integrated Circuit Design* (along with D. Johns and K. Martin) and *Microelectronic Circuits*, 8th edition (along with A. Sedra, K. C. Smith, and V. Gaudet).