

# A High-speed SerDes Transceiver for Wireless Proximity Communication

Jongsun Kim<sup>1</sup> and Jintae Kim<sup>2</sup>

**Abstract**—This paper presents a serializer and deserializer (SerDes) with a phase interpolator (PI) based digital clock and data recovery (CDR) circuit for high-speed and short-range wireless chip-to-chip communication. The SerDes performs 4:1 muxing and 1:4 demuxing functions. The PI-based digital CDR uses an 8-phase delay-locked loop (DLL) to produce a set of evenly spaced reference clock phases. The phase selector performs 2 $\times$  oversampling to recover the data from the input data signal. Implemented in a 65 nm CMOS process, the proposed SerDes achieves a measured data rate of 10 Gbps and a recovered peak-to-peak clock jitter of 36.25 ps. The SerDes occupies an active area of 0.095 mm<sup>2</sup> and dissipates 88 mW at 10 Gbps.

**Index Terms**—SerDes, CDR, clock and data recovery, serializer, deserializer

## I. INTRODUCTION

Recently, the Wireless Gigabit Alliance (WiGig) adopted the unlicensed 60 GHz wireless communication as the short distance, high speed wireless communication standard and IEEE announced the IEEE 802.11ad specification for 60 GHz [1, 2]. The 60 GHz wireless communication system is capable of data rates of up to 6–10 Gbps and can satisfy bandwidth demands in portable and consumer applications. The unprecedented access to the unlicensed spectrum and the small size of



**Fig. 1.** A block diagram of a simplified 60 GHz Wireless chip-to-chip communication chipset with a SerDes.

the transceiver chipsets make 60 GHz a very attractive spectrum for many potential applications that require low energy consumption and low latency. Fig. 1 shows the block diagram of a simplified 60 GHz transceiver chipset. The transceiver chipset includes a media access control (MAC) layer, a physical (PHY) layer, a serializer and deserializer (SerDes), and a RF module. When transmitting and receiving high-speed wireless data over 10 Gb/s between the host and guest RF transceiver, a SerDes converts the slow parallel data to a high-speed serial data stream on the RF transmitter side and converts back the serial data to parallel data on the RF receiver side.

One of the challenges in the design of energy-efficient 60 GHz chipset is the implementation of a SerDes that can provide robust performance with low power dissipation, while maintaining a small area, low complexity, and low bit-error-rates.

The power and performance of the SerDes are primarily determined by the clock and data recovery (CDR) circuit [3–9]. CDRs have been widely used in

Manuscript received Apr. 7, 2017; accepted Oct. 26, 2017

<sup>1</sup>School of Electronic and Electrical Eng., Hongik University

<sup>2</sup>Dept of Electronics Eng., Konkuk University

E-mail : js.kim@hongik.ac.kr

wireline transceivers for backplane and optical applications. Today CDRs are important key building blocks in 60 GHz wireless communication systems. CDRs can be divided into two categories depending on how much digital circuits are contained in a CDR: analog CDRs and digital CDRs. A prime example of an analog CDR is the phase-locked loop (PLL)-based CDR. A widely used digital CDR is the over-sampling CDR [10-12].

In this paper, we introduce a low-power, low-jitter, high-speed SerDes that employs a phase interpolator (PI) based digital CDR for wireless proximity communication [8]. The proposed PI-based digital CDR offers many advantages over PLL-based CDRs, such as faster acquisition time and process variation immunity. The SerDes chip is fabricated in a 65-nm CMOS process and achieves more than 10-Gb/s throughput.

The remainder of this paper is organized as follows: Section II describes the proposed SerDes architecture. Section III describes the circuit design in detail. Section IV shows the implementation results of the fabricated SerDes chip. Finally, the conclusions are given in Section V.

## II. PROPOSED SERDES ARCHITECTURE

Fig. 2 shows the block diagram of the proposed serializer. Fig. 2(a) shows the serializer architecture which consists of a 4-to-1 serializer, a divide by 2 divider, a pseudo random binary sequence (PRBS) generator, and a differential current mode logic (CML) buffer. As shown in Fig. 2(b), the serializer uses two stages of multiplexing to convert the four 2.5 Gbps parallel data ( $D_1 \sim D_4$ ) into a differential 10 Gbps/pin serial data stream (Data1234). The serial data is then transmitted to the RF transmitter. Since the on-chip RF transmitter is closely located, a power hungry equalization technique is not required. Instead a simple differential CML buffer can be used to reduce power consumption. The SerDes transmitter and receiver share a high-speed 5-GHz reference clock (RCLK).

Fig. 3 shows a block diagram of the proposed deserializer implemented as a digital CDR with 4-bit demultiplexed parallel output data. The proposed PI-based digital CDR consists of eight data receiving samplers, an Early-Late (EL) detector, a phase controller,



**Fig. 2.** Block diagram of the proposed Serializer (a) Architecture, (b) 10 Gbps Serializer operation.



**Fig. 3.** Block diagram of the proposed deserializer (= digital CDR with 4-bit demultiplexed parallel output data).

a frequency divider, four phase selectors, and an 8-phase delay-locked loop (DLL).

In an ideal 60 GHz wireless proximity communication system, the sampler receives a small-swing high-speed serial input data from an RF receiver. In this paper, we verify the operation by using the small-swing differential

signal from the CML buffer of the serializer as the input to the deserializer. The minimum input swing level of the sampler for 20 Gbps operation is approximately 7 mV. Because the RF receiver provides an open eye and the CDR is closely located, a complex equalizer is not required at the input of the CDR. The details of the CDR circuit design is discussed more in Section III.

In Fig. 3, the 8-phase DLL [13, 14] is used as a reference clock generator for the phase selectors. It generates eight phase reference clock signals,  $\Phi_0 \sim \Phi_7$ , with a uniform distribution of 45 degrees. Then the four phase selectors generate the eight sampling clocks (SCLK0 ~ SCLK7) that are used to recover the data from the high-speed input data signal. The phase selector consists of two multiplexers and a phase interpolator for providing infinite phase rotation. Each phase selector first selects two adjacent clock signals from the eight reference clocks,  $\Phi_0 \sim \Phi_7$ , and then interpolates them to generate a differential sampling clock from input control codes (MA[1:0], MB[1:0], and PI[8:0]) of the phase controller. Consequentially, the four phase selectors provide the 8-phase sampling clocks (SCLK0 ~ SCLK7) required by the eight samplers for recovering the data using the oversampling technique [3]. Each phase of the sampling clocks is aligned to the input data centers for correct data recovery. The frequency divider receives RCLK as input and generates CLK\_2, which is 1/2 frequency of RCLK, and CLK\_4 clock signal, which is 1/4 frequency.

### III. CIRCUIT DESCRIPTION

As shown in Fig. 2(a), a 4-to-1 serializer consisting of two 2-to-1 Mux is used to convert the four 2.5 Gbps parallel data streams into a 10 Gbps serial data stream.

Fig. 4 shows a schematic of the 2-to-1 current mode logic (CML) based Mux, which comprises two CML D flip-flops (D-FF), a CML latch, and a CML Mux. All the unit circuits are based on differential CML circuits.

Fig. 5 shows a schematic of the sense amplifier (SA) based differential sampler [15], which is used as an input receiver of the deserializer. The output from each sampler is used as an input to the EL detector. Eight clock phases are used for sampling the incoming data bits. A total of eight samplers are employed simultaneously reconstructing 4-bit parallel data.



Fig. 4. 2-to-1 CML based Mux.



Fig. 5. Schematic of the SA-based differential sampler.



Fig. 6. Early-Late (EL) Detector.

Fig. 6 shows the proposed Early-Late (EL) detector. the EL Detector is an 8-bit parallel bang-bang phase detector (BBPD) followed by an 8-bit 1-2 de-multiplexer (DeMux). The EL detector compares the output values of adjacent samplers and generates 8-bit Early<8:1> and Late<8:1> data stream for determining whether the phases of the sampling clocks are fast or slow. The front-end BBPDs generate an 8-bit early/late output (PD<8:1>)



Fig. 7. Phase Controller.



Fig. 8. Phase Selector.

that is demultiplexed with a factor of 2 to produce 8-bit Early<8:1> and Late<8:1> data streams. The purpose of the DeMux is to halve the Early/Late update frequency so that the phase controller of Fig. 7 can be run with a lower operating frequency of CLK\_4 (= 1.25 GHz). By using this DeMux, the phase controller logic synthesized with the 65n CMOS process can easily operate at 2.5 GHz or more.

Fig. 7 shows the proposed phase controller. The phase controller consists of a majority vote logic, a ring counter, and a finite-state machine (FSM). The majority vote logic determines whether the sampling clocks are early or late relative to the incoming data stream by majority voting [3]. The ring counter counts the early/late signal from the majority vote logic and then generates Up/Down signals. The FSM generates the control codes (MA[1:0], MB[1:0], and PI[8:0]) of the phase selector.

Fig. 8 shows the proposed phase selector. The deserializer contains four phase selectors that provide 8-phase sampling clocks (SCLK0 ~ SCLK7) for the eight samplers. The phase selector consists of two differential 4-to-1 multiplexers (MUX) and a phase interpolator (PI). Two adjacent clock phases are selected among the eight phase reference clock signals,  $\Phi_0 \sim \Phi_7$ , according to the code values of MA[1:0] and MB[1:0]. Depending on the control code PI[8:0], the PI interpolates the two input clock phases to generate a differential output clock with an improved resolution of 1/8 phase step.



Fig. 9. Layout and chip microphotograph of the proposed SerDes.



Fig. 10. (a) Test chip-on-board (CoB), (b) measurement setup.

#### IV. MEASUREMENT RESULTS

The proposed SerDes was implemented in a 65 nm CMOS process and tested in a chip-on-board assembly. Fig. 9 shows the chip layout and the microphotograph of the proposed SerDes which occupies an active area of 0.095 mm<sup>2</sup>. Fig. 10(a) shows the test chip-on-board (CoB) and Fig. 10(b) shows the setup used for the measurement. Since we want to verify the function of the SERDES itself without the RF transceivers, we simply connected the serializer and the deserializer via an on-chip differential wire interconnect.

The CDR and SerDes architectures proposed in this paper were originally designed for ultra-high speed



**Fig. 11.** Measured RCLK waveform through a 30-cm SMA cable.



**Fig. 12.** Measured recovered data (2.5 Gbps × 4 = 10 Gbps).

operation of 20 Gbps/pin. Simulation works well at a data rate of 20 Gbps/pin, but in actual measurement only 10 Gbps/pin operation has been confirmed due to limitations of measurement equipment for RCLK generation. We used a pattern generator (Anritsu MP1763C) to generate a differential RCLK. As shown in Fig. 11, the output of the differential RCLK phase is clearly visible at 1 GHz, but the phase starts to change at 5 GHz.

Fig. 12 shows the measured 4-bit parallel data recovered with a PRBS-7 pattern. The output of the Deserializer is through 4 output pins with 4-bit parallel data (DOUT <4:1>). Thus, for aggregate data rates of 10 Gbps, each DOUT pin should operate at 2.5 Gbps. Due to the limitations of the measurement equipment for RCLK generation, the maximum aggregate date rate measured is 10 Gbps (= 2.5 Gbps × 4).

Fig. 13 displays the measured jitter of the recovered clock and the eye diagram of the recovered data,



(a)



(b)

**Fig. 13.** Measured peak-to-peak jitter (a) Recovered clock, (b) Recovered data.

**Table 1.** Performance summary and comparison

|                                | TCASII<br>2013 [4] | VLSI<br>2013 [5] | JSSC<br>2007 [6] | JSSC<br>2011 [7] | This work            |
|--------------------------------|--------------------|------------------|------------------|------------------|----------------------|
| Process                        | 90 nm              | 90 nm            | 0.11 $\mu$ m     | 0.13 $\mu$ m     | 65 nm                |
| Supply                         | 1.2v               | 1v               | 1.2v             | 1.2v             | 1.2v                 |
| CDR Architecture               | PLL-based          | PLL-based        | Oversampling     | PI-based         | PI-based             |
| CDR DEMUX                      | 1:1                | 1:1              | 1:1              | 1:4              | 1:4                  |
| Data Rate (Gbps)               | 12.5               | 5                | 3.2              | 5                | 10                   |
| Power (mW)                     | 84                 | 13.1             | 115              | 18.2             | Ser : 20<br>Des : 68 |
| CDR Bit energy (mW/Gbps)       | 6.72               | 2.62             | 35.9             | 3.64             | 6.8                  |
| Recovered Clock Jitter (Pk-pk) | -                  | 44 pS            | -                | 52.22 pS         | 21.88 pS             |
| Chip Area (mm <sup>2</sup> )   | 0.823              | 0.62             | 0.15             | 0.4              | 0.095                |
| FOM                            | 5.531              | 1.625            | 5.385            | 1.456            | 0.646                |

FOM=power dissipation (mW) × area(mm<sup>2</sup>) / data rate (Gbps)

respectively. The peak-to-peak jitter of the recovered clock is 21.88 ps and the peak-to-peak jitter of the recovered data signal is 30 ps. The estimated BER is 1e-28 at 10 Gbps. As shown in Table 1, when compared with existing CDRs, the proposed PI-based digital CDR

achieves highest figure-of-merit (FOM) in terms of power dissipation, die area, and data rate.

## V. CONCLUSIONS

A low-power 10 Gbps SerDes is presented that uses a PI-based digital CDR for energy-efficient short-range wireless chip-to-chip communication. The DLL-based phase-interpolating CDR performs 2 $\times$  oversampling to recover the data from the input signal. Implemented in a 65 nm CMOS process, the proposed SerDes achieves a measured data rate of 10 Gbps and a recovered peak-to-peak clock jitter of 21.88 ps. The SerDes occupies an active area of only 0.095 mm<sup>2</sup> and the CDR dissipates 6.8 mW/Gbps.

## ACKNOWLEDGMENTS

This work was supported by the KIAT grant funded by the Korean government (MOTIE No. N0001883). The EDA tools were supported by IDEC.

## REFERENCES

- [1] A. Tomkins, et al., "A 60 GHz, 802.11ad/WiGig-Compliant Transceiver for Infrastructure and Mobile Applications in 130 nm SiGe BiCMOS," *IEEE J. Solid-State Circuits*, vol. 50, pp. 1-17, Oct. 2015.
- [2] Toshiya Mitomo, et al., "A2-Gb/s throughput CMOS transceiver chipset with in-package antenna for 60-GHz short-range wireless communication," *IEEE J. Solid-State Circuits*, vol. 47, pp. 3160-3171, Dec. 2012.
- [3] M.-J. E. Lee, et al., "An 84-mW 4-Gb/s Clock and Data Recovery Circuit for Serial Link Applications," *Symp. VLSI Circuits Dig. Tech. Papers*, 2001, pp. 149-152.
- [4] A. Zargaran-Yazd and S. Mirabbasi, "12.5-Gb/s Full-Rate CDR With Wideband Quadrature Phase Shifting in Data Path," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 60, no. 6, pp. 297-301, Jun. 2013.
- [5] G. Shu, et al., "A 5Gb/s 2.6mW/Gb/s Referenceless Half-Rate PRPLL-based Digital CDR," *Symp. VLSI Circuits Dig. Tech. Papers*, 2013, pp. C278-C279.
- [6] M. van Ierssel, et al., "A 3.2 Gb/s CDR Using Semi-Blind Oversampling to Achieve High Jitter Tolerance," *IEEE J. Solid-State Circuits*, vol. 40, no. 10, pp. 2224-2234, Oct. 2007.
- [7] S.-Y. Lee, et al., "250 Mbps–5 Gbps Wide-Range CDR With Digital Vernier Phase Shifting and Dual-Mode Control in 0.13 μm CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 11, pp. 2560-2570, Nov. 2011.
- [8] S. Han, T. Kim, J. Kim, and Jongsun Kim, "A 10 Gbps SerDes for wireless chip-to-chip communication," 2015 International SoC Design Conference, pp. 17-18, 2015.
- [9] S. Butala, and Behzad Razavi, "A CMOS Clock Recovery Circuit fr 2.5-Gb/s NRZ Data," *IEEE Journal Solid-State Circuits*, vol. 36, no. 3, pp. 432-38, March 2001.
- [10] M.-J. Edward Lee, W.-J. Dally, John W. Poulton, P. Chiang, and S. Greenwood, "An 84-mW 4-Gb/s Clock and Data Recovery Circuit for Serial Link Applications," Symp. on VLSI Circuits Digest of Technical Papers, pp. 149-52, 2001.
- [11] K. Lee, S. Kim, Gijung Ahn, and Deog-Kyoong Jeong, "A CMOS Serial Link for Fully Duplexed Data Communication," *IEEE Journal Solid-State Circuits*, Vol. 30, No. 4, pp. 353-64, April, 1995.
- [12] Sungjoon Kim, Kyeonghoee, Deog-Kyoong, David D. Lee, and Andreas G. Nowatzky, "An 800Mbps Multi-Channel serial Link with 3X Oversampling," IEEE Custom Integrated Circuits Conference, pp. 451-54, 1995.
- [13] Jongsun Kim, et al., "A high-resolution dual-loop digital DLL," *Journal of Semiconductor Technology and Science*, vol. 16, no. 4, pp. 520-527, Aug. 2016.
- [14] D. Lee and Jongsun Kim, "5 GHz all-digital delay-locked loop for future memory systems beyond double data rate 4 synchronous dynamic random access memory," *IET Electronics Letters*, vol. 51, no. 24, pp. 1973-1975, Nov. 2015.
- [15] M.-J. E. Lee, W. J. Dally, and P. Chiang, "Low-power area efficient high speed I/O circuit techniques," *IEEE J. Solid-State Circuits*, vol. 35, pp. 1591-1599, Nov. 2000



**Jongsun Kim** received his Ph.D. degree in electrical engineering from the University of California, Los Angeles (UCLA) in 2006 in the field of Integrated Circuits and Systems. He was a postdoctoral fellow at UCLA from 2006 to 2007. From 1994 to 2008, he was with Samsung Electronics as a senior research engineer in the DRAM Design Team, where he worked on the design and development of SDRAMs, SGDRAMs, Rambus DRAMs, DDR3 and DDR4 DRAMs. Dr. Kim joined the School of Electronic & Electrical Engineering, Hongik University in March 2008. Professor Kim's research interests are in the areas of high-performance mixed-signal circuits and systems design. His research areas include high-speed and low-power I/O interface circuits, clock recovery circuits (PLLs/DLLs/CDRs), signal integrity and power integrity, low-power memories, and power-management ICs (PMICs). Prof. Kim is a member of IEEE, IEIE, and IEICE.



**Jintae Kim** received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1997, and the M.S. and Ph.D. degrees in electrical engineering from University of California, Los Angeles, CA, in 2004 and 2008, respectively. He held various industry positions at Barcelona Design, CA, SiTime Corporation, CA, and Agilent Technologies, CA, as a key technical contributor for their high-speed A/D converters and timing IC products. Since 2012, he has been an assistant and associate professor at Konkuk University, Seoul, Korea, where he is focusing on low power mixed-signal IC designs for communication and sensor applications. Dr. Kim is a recipient of the IEEE Solid-State Circuits Predoctoral Fellowship in 2007.