

# All-Digital High-Resolution Frequency Measurement SoC for Rapid MEMS Readouts

Hitesh Kumar Sahu<sup>1</sup>, Emon Sarkar<sup>1</sup>, Pushkar Sathe<sup>1</sup>, Laxmeesha Somappa<sup>1,2</sup>

<sup>1</sup>*Department of Electrical Engineering, IIT Bombay, Mumbai India*

<sup>2</sup>*IIT Bombay Centre for Semiconductor Technologies (SEMIX), India*

Email: laxmeesha@ee.iitb.ac.in

**Abstract**—This work proposes an all-digital on-chip digital Frequency measuring system (FMS) that is intended to replace conventional low-resolution measuring instruments. The method of measurement is a two-step process with a coarse estimate followed by a fine resolution frequency estimation from three DFT samples using the Candan method. We describe the design, and performance evaluation, as a high-resolution rapid frequency measurement system, particularly for high-precision MEMS applications. The proposed FMS has been synthesized, implemented and verified on a Xilinx ZYNQ XC7Z020-1CLG400C SoC. The ASIC implementation of the SoC on a 65nm CMOS technology reveals that the design consumes 120 mW power at a 100 MHz clock while providing sub-5 ppm resolution with a readout time of 5.7  $\mu$ s, suitable for MEMS applications with bandwidth up to 10 MHz.

**Index Terms**—Frequency estimation, Fast Fourier Transform (FFT), Candan Method, Frequency counters, Time to digital converters, MEMS.

## I. INTRODUCTION

The precision and stability of clocks in integrated systems can significantly limit performance, as demonstrated by the necessity for clock synchronization in GPS receivers. Because of size and power limits, advanced clocks, such as atomic clocks, are frequently impractical for mobility, limiting these powerful instruments to fixed settings and remote access. Miniaturization and power reduction in portable timekeeping and frequency reference devices are highly sought after [1]. Microelectromechanical systems (MEMS) technology stands out as an important advancement in this area. Mechanical characteristics and systems may be miniaturised to micron and even nano sizes using MEMS technology. This feature has already resulted in considerable size and power consumption reductions in a variety of applications, including displays, sensors, and fluidic systems [1]. In addition, MEMS technology is positioned to provide improved mechanisms for timekeeping and frequency control operations in portable wireless devices, answering a critical requirement in this domain [2], [3].

Integrated into frames, gyroscopes measure angular velocity as frames rotate. MEMS gyroscopes, known for their compact size and low power usage, find extensive use in consumer electronics, automotive safety systems, and robotics [4]. High-frequency bulk-acoustic-wave (BAW) gyroscopes offer benefits, including heightened sensitivity, broader dynamic range, and enhanced shock and vibration tolerance, rendering them ideal for such applications [5]. Precise performance hinges



Fig. 1: Nyquist-sampled, all digital frequency readout system architecture.

on aligning frequencies in high-Q gyroscopic resonant modes, especially within the 1-10 MHz range. In this frequency spectrum, gravimetric sensors using microfabricated in-plane bulk resonators have been developed [6], [7]. These resonant devices rely on accurate frequency shift measurements, forcing a portable, high-precision frequency measuring instrument for precise connections, minimizing errors in noisy environments.

Tabletop frequency counters are commonly used in the domain of frequency measurement. These devices use a system based on counting a certain number of pulses inside a predefined measuring window, known as the “gate time”, to determine the frequency of the input signal. Prior to frequency computation, these devices use reciprocal frequency counters [8] to determine the period of the input signal over the aforementioned gate time. The precision of measurements in this type of system is inversely proportional to the amount of time in the measurement interval as well as the degree of precision of the internal oscillators. As a result, the temporal basis for these systems is principally built by the use of oven-controlled crystal oscillators, which increases the total dimensions of the measuring system. Furthermore, the use of internal reference oscillators in these instruments generates oscillator drift, leading to errors in the measurements acquired.

TDCs have become essential for high-resolution time and frequency measurements, notably in physics research and time-of-flight (ToF) applications. Notably, there is a noticeable shift towards the incorporation of Field-Programmable Gate Arrays (FPGAs) into finished products, indicating a shift away from the previous perception of FPGAs as purely prototype



Fig. 2: Datapath for the all-digital Candan's frequency readout architecture using 20-bit fixed point representation.

platforms [9]. Nonetheless, there is a significant difference between TDC systems based on Application-Specific Integrated Circuits (ASICs) and those based on FPGA technology. TDCs based on FPGAs have made tremendous advances, attaining resolutions below the 10-picosecond (ps) threshold [10]–[14]. Notably, Szplet *et al.* [15] have reported resolutions below 1 ps. The subsequent development of the Xilinx FPGA UltraScale architecture has allowed for the research of various TDC configurations, such as [16]–[19]. Furthermore, the increased availability of resources inside FPGA devices has spurred the exploration of multi-chain TDC designs, which have the benefit of providing greater linearity when compared to traditional Tapped Delay Line (TDL) approaches.

Recent literature [20]–[25] comprehensively addresses TDC developments, especially FPGA-based TDCs that enable quicker conversion, scalability, and flexibility. They do, however, have dynamic range constraints for certain clock frequencies or measurement durations, limiting high-resolution frequency monitoring. To address these restrictions, a complete solution integrating a tunable oscillator and flexible, programmable measuring methods, particularly within the 1–10 MHz frequency range, is required to handle varied bulk acoustic resonator devices. Authors in [26] have proposed a portable, programmable frequency measurement device (PrO-FMS) in this context, designed to measure frequency with excellent resolution in a relatively short measurement interval. This unique implementation compares five separate frequency estimate techniques: Prony's approach [27], the Modified Pisarenko method [28], ZC interpolation [29], the Candan method [30], and the Djukanovic method [31]. Comparing these various frequency estimation approaches reveals the consistent superiority of the Candan methodology in producing accurate frequency estimates.

This work finds its motivation in work performed by authors in [26]. The proposed work aims to realize an all-digital frequency measurement system utilizing the Candan approach while balancing the trade-offs between resolution, read-out time and area/power consumption. The additional benefit of the technique is the use of a Nyquist ADC for high-resolution readout, unlike oversampled converters. We present a conceptual block diagram in Figure 1 to illustrate

the system's framework. Given the utilization of the Candan technique for frequency estimation, the incorporation of a variable-length Fast Fourier Transform (FFT) hardware component becomes imperative to enhance estimation accuracy. This research encompasses the comprehensive digitalization of the frequency measurement system, employing the Cooley-Tukey with Matrix Transposition (CTMT) method [32] for the computation of 256-point FFTs, denoted as CTMT-256. Furthermore, the system integrates a max function block and executes the Candan computation to yield the final measured frequency output.

## II. PROPOSED ARCHITECTURE

### A. Cooley-Tukey with Matrix Transposition Architecture

The Cooley-Tukey algorithm is a prominent and commonly utilized method for FFT computations. This can be further ameliorated by combining it with matrix transposition methods. Because matrix transposition lessens alternate row-wise/column-wise matrix access (AR/CMA), there are efficient memory access patterns during calculation [33]. A large 1-D digital data will be divided into many batches of smaller FFTs using this approach. Assuming that  $N_1$  and  $N_2$  are both powers of 2, let  $N = N_1 \times N_2$  represent the size of a huge FFT. The Cooley-Tukey algorithm is defined by the following equation:

$$X[k_1 N_2 + k_2] = \sum_{n_1=0}^{N_1-1} [W_N^{n_1 k_2} \cdot A] W_{N_1}^{n_1 k_1} \quad (1)$$

$$\text{and } A = \left( \sum_{n_2=0}^{N_2-1} x[n_2 N_1 + n_1] W_{N_2}^{n_2 k_2} \right) \quad (2)$$

where  $W_N^{nk} = e^{-j \frac{2\pi nk}{N}}$  and  $0 \leq k_1 \leq N_1$  and  $0 \leq k_2 \leq N_2$

The CTMT\_256 block shown in Fig. 2, is designed specifically to execute FFT on 256 data points using 20-bit fixed point encoding where the MSB is represented as a sign bit, the next 7 bits represent integers and the remaining 12 bits are fixed for the fractional part. The selection of the following set of representations was evaluated through simulations performed on the CTMT algorithm for varying word lengths and fractional lengths. From the SNR plot given in Fig. 3,



Fig. 3: Sinusoid SNR response with varying word length and fractional length(for 256 points FFT).

we conclude the representation of 20-bit word length and 12-bit fractional length which provides the maximal SNR as the optimal representation for minimal hardware.

The core components of the computing blocks are two modules: 16-point radix-2 butterfly architecture and a complex multiplier to multiply 256 serial data by the matching twiddle factor. The amount of calculations needed for the FFT computation is determined by dividing the 256 data points into  $N_1 \times N_2$  product pairs. The amount of complex multiplications and additions is provided as  $N(N_1 + N_2 + 1)$  and  $N(N_1 + N_2 - 2)$ , and the decomposition alone determines the computation complexity. Because of this, we divide 256 data points into 16 x 16.

In this proposed architecture, the computations are carried out in three steps. In the first phase, using column-wise addressing representing the first matrix transposition, in each cycle 16 data points are retrieved from the main buffer and sent through the radix-2 butterfly FFT design with a total use of 16 cycles and stored in a transitional buffer. In the second phase, data is read one at a time from the transitional buffer, multiplied by the appropriate twiddle factor, and then written back into it. This entire procedure requires 256 clock cycles.

The third stage in the CTMT-256 design is fetching 16 data points using row-wise addressing, which corresponds to the second matrix transposition, from the transitional buffer. The outputs after processing these 16 data points by the radix-2 butterfly FFT architecture are stored in a parallel-in serial-out register. In each cycle, a data point is sent to the Max\_Value detector shown in Fig. 2 to calculate the peak value, and the corresponding value is then written back into the main buffer using column-wise addressing, signifying the third matrix transposition. This whole process also requires 256 cycles. By incorporating three sequential matrix transpositions, the transitional buffer greatly decreases hardware complexity as well as reduces dynamic power dissipation, which improves the system's performance and overall energy efficiency.

#### B. Max Function Architecture

The designed max function shown in Fig. 2, offers a bitwise comparison method for determining the maximum value between two binary inputs( $k_p$ ). While doing so, it simultaneously extracts the two adjacent values as well ( $k_p - 1$



Fig. 4: Inverse tangent computation using a simplified look-up approach and coordinate mapping.

and  $k_p + 1$ ), hence it generates three distinct DFT points as required by the Candan frequency estimator method. This module seamlessly accommodates 20-bit comparisons through efficient instantiation. Furthermore, in the context of complex number comparisons, it embraces a heuristic strategy, circumventing the intricacies of square root calculations. Instead, it elegantly relies on the summation of magnitudes of real and imaginary components, thus mitigating computational complexity while still delivering an effective heuristic for identifying the maximum value.

#### C. Candan Computation Architecture

Candan's frequency estimation architecture has been designed to implement the fine-frequency estimation using 3 DFT points generated by the max function as the input in 20-bit fixed point representations, having 7 bits integer, 12 fractional bits and MSB as sign bit. Delta value (fine estimate) output is computed the same representation. An estimated design of Candan's frequency estimation is shown in Fig. 2. Tan inverse architecture which estimates the frequency is built on the CORDIC algorithm shown in Fig. 4 based on the equation for  $\delta$ -estimate using the Candan algorithm given by:

$$\delta = \frac{\tan^{-1}(\tan(\frac{\pi}{N}) * \text{Real}(\frac{R[k_p-1] - R[k_p+1]}{2R[k_p] - R[k_p+1] - R[k_p-1]}))}{\pi} \quad (3)$$

Eq. 3 is implemented on hardware by dividing into smaller algebraic expressions, which computes the numerator and denominator using a set of pipelined arithmetic units. The numerator and denominator are passed to the inverse tangent hardware unit as the Y and X coordinates respectively, which computes the  $\delta$  using the CORDIC algorithm as shown in Fig. 4. The CORDIC algorithm is implemented in an iterative style which consumes 38 cycles to yield output, the details of the CORDIC implementation is not provided.

## III. EXPERIMENTS AND RESULTS

### A. FPGA Implementation

The overall FMS architecture shown in Fig. 2 has been validated using Modelsim simulations before being synthesized and implemented on a Xilinx ZYNQ XC7Z020-1CLG400C SoC. Table. I shows the device utilization after implementation



Fig. 5: CTMT\_256 results obtained for two-test frequencies based on the hardware implementation.

for the proposed system. Notably, slice LUTs utilizations account for 97.19% of the available resources with significantly fewer slice register consumption. Despite the substantial resource consumption by slice LUTs, the system demonstrates high performance in terms of readout time and resolution.

#### B. Verification Setup and Results

Two single-tone sinusoids at 3.046875 MHz and 5.079125 MHz, sampled at 20 MHz frequency were used to verify the operation of the proposed CTMT-256 and Candan-based FMS as illustrated in Fig 2. To precisely capture the frequency component in one frequency bin of the FFT and to prevent spectral leakage, coherent sampling was used for the base frequency as a baseline reference. The FFT output obtained is shown in Fig. 5. The relative error plot obtained for two frequency bins by Candan's architecture is shown in Fig. 6.

To obtain the resolution, the maximum error (in units of parts-per-million or ppm) in the frequency estimate was evaluated by setting the base frequency  $f_{in}$  on the DFT bin and incrementally changing the frequency with a fixed offset of  $\pm 5$  kHz as shown in Fig. 6. The offset is increased until half of the DFT-resolution bin. We observe a 2.63 ppm error for the 3 MHz base-frequency and 1.91 ppm for the 5 MHz base-frequency case. With the max function, the CTMT\_256 block requires 528 cycles to fulfil its functionality, and the Candan estimate block requires an additional 38 cycles to calculate the delta. The peak operational frequency was 50 MHz for the FPGA. The readout time for the suggested configuration will therefore be  $(528 + 38) * (1/50\text{MHz}) = 11.32 \mu\text{s}$ .



Fig. 6: Relative error calculated for frequency estimation.

TABLE I: Resource utilization by proposed FMS architecture on Xilinx Zynq XC7Z020-1CLG400C SoC

| Resources       | Available on FPGA | Used  | Utilization |
|-----------------|-------------------|-------|-------------|
| Slice LUTs      | 53200             | 51705 | 97.19%      |
| Slice Registers | 106400            | 25862 | 24.31%      |
| F7 Muxes        | 26600             | 8123  | 30.54%      |
| F8 Muxes        | 13300             | 3962  | 29.79%      |



Fig. 7: Layout of the complete rapid FMS architecture implemented in 65nm CMOS technology.

#### IV. ASIC IMPLEMENTATION

The proposed FMS SoC is implemented in a 65nm CMOS technology for a clock frequency of 100 MHz. Fig. 7 shows the layout of the implementation occupying a total area of  $1 \text{ mm}^2$  while consuming 120 mW of power at 1.2 V supply voltage. At a clock rate of 100 MHz, the SoC provides a sub 5-ppm resolution with a readout time of  $5.66 \mu\text{s}$ . The design was optimized for timing and can be adapted to lower power operation by optimizing for power while trading off the readout time.

#### V. CONCLUSION

This work presented an all-digital FMS for rapid high-precision MEMS readout applications. The system exhibits fast readout and high resolution using a CTMT-256 FFT engine, Max function, and Candan computation architecture. The design was implemented on an FPGA and verified for its functionality. ASIC implementation of the proposed all-digital frequency readout system in a 65nm CMOS technology shows the power consumption of 120 mW at a 100 MHz clock while exhibiting a readout time of  $5.66 \mu\text{s}$  with sub-5 ppm resolution. The future scope of this work is to integrate a variable-length FFT controller to reuse the existing hardware, with the aim of enhancing measurement resolution while providing application-dependent reconfigurability.

## REFERENCES

- [1] C. T.-C. Nguyen, "MEMS Technology for Timing and Frequency Control," *IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control*, vol. 54, no. 2, February 2007.
- [2] C. T.-C. Nguyen, "Transceiver front-end architectures using vibrating micromechanical signal processors," in *RF Technologies for Low Power Wireless Communications*. G. I. Haddad, T. Itoh, and J. Harvey, Eds. New York: Wiley IEEE-Press, 2001, pp. 411–461.
- [3] C. T.-C. Nguyen, "Vibrating RF MEMS overview: applications to wireless communications (invited)," in *Proc. SPIE: Micromachin. Microfabr. Process Technol.*, San Jose, CA, vol. 5715, Jan. 22–27, 2005, pp. 11–25.
- [4] C. Li *et al.*, "An FPGA-based interface system for high frequency bulk acoustic-wave (BAW) micro-gyroscopes with in-run automatic modematching," *IEEE Transactions on Instrumentation and Measurement* (Volume: 69, Issue: 4, April 2020).
- [5] F. Ayazi, "Multi-DOF inertial MEMS: From gaming to dead reckoning," in *Proc. 16th Int. Solid-State Sensors, Actuat. Microsyst. Conf.*, Jun. 2011, pp. 2805–2808.
- [6] A. T. Zielinski, M. Kalberer, R. L. Jones, A. Prasad, and A. A. Seshia, "Particulate mass sensing with piezoelectric bulk acoustic mode resonators," in *Proc. IEEE Int. Freq. Control Symp. (IFCS)*, May 2016, pp. 1–6.
- [7] A. Prasad, A. A. Seshia, A. T. Zielinski, M. Kalberer, and R. L. Jones, "Studying particulate adsorption by drying droplets on a microfabricated electro-acoustic resonator," in *Proc. Eur. Freq. Time Forum (EFTF)*, Jun. 2014, pp. 28–31.
- [8] Agilent 53230A & RF/Universal Frequency Counter/Timers. [Online]. Available: <https://literature.cdn.keysight.com/litweb/pdf/5990-6283EN.pdf?id=1942617>.
- [9] R. Szplet, P. Kwiatkowski, K. Różyc, Z. Jachna, and T. Sondej, "Picosecond-precision multichannel autonomous time and frequency counter," *Rev. Sci. Instrum.*, vol. 88, no. 12, p. 125101, Dec. 2017.
- [10] Y. Wang, Q. Cao, and C. Liu, "A Multi-Chain Merged Tapped Delay Line for High Precision Time-to-Digital Converters in FPGAs," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 65, no. 1, pp. 96–100, Jan. 2018.
- [11] P. Chen, Y. Hsiao, Y. Chung, W. X. Tsai, and J. Lin, "A 2.5-ps Bin Size and 6.7-ps Resolution FPGA Time-to-Digital Converter Based on Delay Wrapping and Averaging," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 25, no. 1, pp. 114–124, Jan. 2017.
- [12] Q. Shen *et al.*, "A multi-chain measurements averaging TDC implemented in a 40 nm FPGA," *2014 19th IEEE-NPSS Real Time Conf. RT 2014 - Conf. Rec.*, pp. 6–8, 2015.
- [13] Q. Shen *et al.*, "A 1.7 ps equivalent bin size and 4.2 ps RMS FPGA TDC based on multichain measurements averaging method," *IEEE Trans. Nucl. Sci.*, vol. 62, no. 3, pp. 947–954, 2015.
- [14] X. Qin, L. Wang, D. Liu, Y. Zhao, X. Rong, and J. Du, "A 1.15-ps Bin Size and 3.5-ps Single-Shot Precision Time-to-Digital Converter With On-Board Offset Correction in an FPGA," *IEEE Trans. Nucl. Sci.*, vol. 64, no. 12, pp. 2951–2957, Dec. 2017.
- [15] R. Szplet, D. Sondej, and G. Grzeda, "Subpicosecond-resolution timeto-digital converter with multi-edge coding in independent coding lines," in *2014 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) Proceedings*, 2014, pp. 747–751.
- [16] C. Liu, Y. Wang, P. Kuang, D. Li, and X. Cheng, "A 3.9 ps RMS resolution time-to-digital converter using dual-sampling method on Kintex UltraScale FPGA," *2016 IEEE-NPSS Real Time Conf. RT 2016*, pp. 1–3, 2016.
- [17] Y. Wang and C. Liu, "A 3.9 ps Time-Interval RMS Precision TimetoDigital Converter Using a Dual-Sampling Method in an UltraScale FPGA," *IEEE Trans. Nucl. Sci.*, vol. 63, no. 5, pp. 2617–2621, 2016.
- [18] Y. Wang and C. Liu, "A 4.2 ps Time-Interval RMS Resolution TimetoDigital Converter Using a Bin Decimation Method in an UltraScale FPGA," *IEEE Trans. Nucl. Sci.*, vol. 63, no. 5, pp. 2632–2638, 2016.
- [19] H. Chen and D. D.-U. Li, "Multi-channel, Low Nonlinearity TimetoDigital Converters based on 20nm and 28nm FPGAs," *IEEE Trans. Ind. Electron.*, pp. 1–1, 2018.
- [20] J. Kalisz, "Review of methods for time interval measurements with picosecond resolution," *Metrologia*, vol. 41, no. 1, pp. 17–32, Feb. 2004.
- [21] A. Rivetti, "Fast front-end electronics for semiconductor tracking detectors: Trends and perspectives," *Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip.*, vol. 765, pp. 202–208, Nov. 2014.
- [22] Z. Cheng, X. Zheng, M. J. Deen, and H. Peng, "Recent Developments and Design Challenges of High-Performance Ring Oscillator CMOS Time-toDigital Converters," *IEEE Trans. Electron Devices*, vol. 63, no. 1, pp. 235–251, Jan. 2016.
- [23] S. Henzler, *Time-to-Digital Converters*, vol. 29. Dordrecht: Springer Netherlands, 2010.
- [24] R. Szplet, "Time-to-Digital Converters," in *Design, Modeling and Testing of Data Converters*, P. Carbone, S. Kiaei, and F. Xu, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014, pp. 211–246.
- [25] P. Napolitano, A. Moschitta, and P. Carbone, "A survey on time interval measurement techniques and testing methods," in *2010 IEEE Instrumentation & Measurement Technology Conference Proceedings*, 2010, pp. 181–186.
- [26] Laxmeesha Somappa , Adarsh G. Menon , Ajay K. Singh , Ashwin A. Seshia, and Maryam Shojaei Baghini, "A Portable System With 0.1-ppm RMSE Resolution for 1–10 MHz Resonant MEMS Frequency Measurement", *IEEE Transactions on Instrumentation and Measurement*, VOL. 69, NO. 9, SEPTEMBER 2020.
- [27] D. W. Tufts and P. D. Fiore, "Simple, effective estimation of frequency based on Prony's method," in *Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. Conf. Proc.*, vol. 5, May 1996, pp. 2801–2804.
- [28] M. D. Kusljevic, "A simple recursive algorithm for frequency estimation," *IEEE Trans. Instrum. Meas.*, vol. 53, no. 2, pp. 335–340, Apr. 2004.
- [29] V. Friedman, "A zero crossing algorithm for the estimation of the frequency of a single sinusoid in white noise," *IEEE Trans. Signal Process.*, vol. 42, no. 6, pp. 1565–1569, Jun. 1994.
- [30] C. Candan, "Analysis and further improvement of fine resolution frequency estimation method from three DFT samples," *IEEE Signal Process. Lett.*, vol. 20, no. 9, pp. 913–916, Sep. 2013.
- [31] S. Djukanović, T. Popović, and A. Mitrović, "Precise sinusoid frequency estimation based on parabolic interpolation," in *Proc. 24th Telecommun. Forum (TELFOR)*, Nov. 2016, pp. 1–4.
- [32] Xiaowen Chen , Yuanwu Lei , Zhonghai Lu , Senior Member, IEEE, and Shuming Chen, "A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition", *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, VOL. 26, NO. 10, OCTOBER 2018.
- [33] L. Guo, Y. Tang, Y. Dou, Y. Lei, M. Ma, and J. Zhou, "Window memory layout scheme for alternate row-wise/column-wise matrix access," *IEICE Trans. Inf. Syst.*, vol. E96.D, no. 12, pp. 2765–2775, 2013.