

# Nanoelectromechanical Binary Comparator for Edge-Computing Applications

Victor Marot, Manu Bala Krishnan, Mukesh Kumar Kulsreshath,  
Elliott Worsey, Roshan Weerasekera and Dinesh Pamunuwa

*University of Bristol*

E-mail: hk19236@bristol.ac.uk, manubalakrishnan@gmail.com, mukesh.kulsreshath@bristol.ac.uk, elliott.worsey@bristol.ac.uk, roshan.weerasekera@bristol.ac.uk, dinesh.pamunuwa@bristol.ac.uk

**Abstract**—Bitwise comparison is a fundamental operation in many digital arithmetic functions and is ubiquitous in both datapath and control elements; for example, many machine learning algorithms depend on binary comparison. This work proposes a new class of binary comparator circuit using 4-terminal nanoelectromechanical (NEM) relays that use just 6 devices compared to 9 transistors in CMOS implementations. Moreover, NEM implementations are capable of withstanding much higher temperatures, up to 300°C, and radiation levels, well over 1 Mrad absorbed dose, conditions which are common across many industrial edge applications, with near zero standby power. A 1-bit magnitude and equality comparators comprising two in-plane silicon 4-terminal relays each were fabricated on a silicon-on-insulator substrate and electrically characterized for proof of concept, the first such demonstration. Using the 1-bit comparators as building blocks, a scalable tree-based topology is proposed to implement higher-order comparators, resulting in  $\approx 47\%$  reduction in device count over a CMOS implementation for a 64-bit comparator. Circuit level simulations of the comparators using accurate device models show that a single operation consumes at most 21 fJ a 9-fold reduction over the best CMOS offering in an equivalent process node.

**Index Terms**—binary comparators, low-power, digital logic, mechanical computing, NEMS, edge-computing, high-temperature, binary decision trees, radiation-hard

## I. INTRODUCTION

Edge computing is an alternative paradigm to cloud-based computing that is key to fulfilling the full promise of the industrial IoT, to avoid unnecessary data transfer, speed up critical responses, and reduce energy consumption and storage costs [1]. The requirements for edge computing tend to centre around tight integration of sensing and processing, data classification, monitoring and supervisory functions, often under energy constraints and harsh operating conditions such as high temperatures and radiation levels [2], [3]. It is easier to meet these disparate requirements with a more-than-Moore approach, which combine other technologies, such as sensors, MEMS and NEMS with CMOS [4].

In this work, we describe a new class of modular magnitude and equality comparator circuit built from two four-terminal (4-T) nanomechanical relays. We have fabricated prototypes on a silicon-on-insulator (SOI) substrate and carried out measurements to demonstrate proof of concept of operation. We then propose a new class of 1-bit NEM binary comparators centered around this base circuit which use a total of 6 devices, compared to at least 9 transistors for the equivalent functionality

in CMOS. Next, we show how this can be scaled up to implement a 64-bit comparator, with further savings using logic optimisation to use  $\approx 47\%$  less devices than its CMOS equivalent. Using simulations, we extract the worst-case energy consumption per operation to be 21 fJ, a reduction of  $9\times$  compared to the best CMOS offering in an equivalent node. This energy saving is due in part to our architecture having fewer devices, but a much bigger effect is a smaller effective switching activity factor. The 4-T relays lend themselves to circuit architectures where primary inputs drive the gate source and body terminals, which result in less activity on intermediate nodes compared to transistor-based circuit styles. The worst-case propagation delay is greater due to the higher mechanical latency of relays, and is 2.66  $\mu$ s compared to 642 ps for an equivalent architecture. Many IoT applications, however, have low throughput requirements, with low energy consumption being the main requirement.

The magnitude and equality comparator proposed here can be combined with other NEMS-based circuits or CMOS in the same chip where the silicon relays can be fabricated alongside transistors as a front-end-of-line [5] or back-end-of-line process for a more-than-Moore approach [4], [6]. As binary comparators are ubiquitous datapath and control elements in modern computing, ranging from Content-Addressable Memory (CAM) [7], to Hamming weight comparison [8], to Multiple-Input Multiple-Output (MIMO) detectors [9], to lightweight machine learning models based on propositional logic such as the Tsetlin Machine [10] and binary decision trees [11], [2], the proposed circuit has significant potential to reduce energy and improve harsh environment capability in edge applications.

## II. RELATED WORK: BINARY COMPARATORS

Comparators are divided into two categories: magnitude and equality comparators. Magnitude comparators check if the operand  $A$  is greater than or less than the other operand  $B$ , i.e. if  $A > B$  or  $A < B$ . These two forms are logically equivalent, only differing in the ordering of the operands. Equality comparators produce a '1' when the inputs are equal, in essence behaving as an XNOR gate. The truth table for both categories are shown in Table I.

In this paper, the term binary comparator is used to refer to a circuit that combines the magnitude and/or equality comparator functionality, to achieve at least two distinct comparison functions, allowing deduction of the third function.

TABLE I: Truth table for binary comparator operations.

| A | B | $A = B$ | $A < B$ | $A > B$ |
|---|---|---------|---------|---------|
| 0 | 0 | 1       | 0       | 0       |
| 0 | 1 | 0       | 1       | 0       |
| 1 | 0 | 0       | 0       | 1       |
| 1 | 1 | 1       | 0       | 0       |

### a) Priority Encoder-based Comparators

The architecture proposed in [12] follows the basic principle of priority encoding for binary comparison, employing a pre-encoding stage to compare individual bits and an encoding stage to locate the most significant inequality. Lam and Tsui proposed an alternative to increase the speed [13], but at the cost of more devices, the highest of all the reviewed implementations (see Table II).

### b) BCL-based Comparators

Kim and Yoo proposed a scheme, Bitwise Competition Logic (BCL), to find the highest priority bit that leads to an inequality as well as the nature of the inequality [14]. This architecture yields the lowest device count of all implementations discussed here. However, the use of dynamic logic results in higher power consumption, as can be seen in Table II.

### c) CLA-based Tree Comparators

Binary comparators based on tree structures have been implemented in [15] and improved in [16] to reduce switching activity. The basic concept is that a Carry-Look-Ahead (CLA) adder is used to compute the addition of the two operands, and the carry-out signal is used to distinguish between  $A \geq B$  and  $A < B$ . A dual-mode logic [17] alternative was proposed in [18] to create a more efficient circuit but increasing the device count by 25% (see table II).

### d) Non-CLA Tree-based Comparators

The architecture proposed in [19] is based on a similar observation as in [15], that a 2-bit binary comparison is analogous to the carry generation of a binary addition. Through boolean manipulation, they obtained equations (1) and (2) with  $G_{[63:0]}$  returning  $A_{[63:0]} < B_{[63:0]}$  and  $EQ_{[63:0]}$  returning  $A_{[63:0]} = B_{[63:0]}$  where  $G_k$  evaluates the boolean expression  $A_k < B_k$  at bit  $k$  and  $EQ_m$  evaluates  $A_m = B_m$  at bit  $m = k + 1$ .

$$G_{[63:0]} = G_{63} + \sum_{k=0}^{62} (G_k \cdot \prod_{m=k+1}^{63} EQ_m) \quad (1)$$

$$EQ_{[63:0]} = \prod_{m=0}^{63} EQ_m. \quad (2)$$

A block diagram of an 8-bit binary comparator of that scheme is reproduced in Fig. 1. To obtain the input needed for equations (1) and (2), a pre-encode stage is employed, made of single-bit binary comparators themselves comprising an equality (EQ) and a magnitude comparator (MAG) with an inverter for one of the inputs for a total of 9 transistors per bit.

The bit-width of the binary comparator is increased by combining the single-bit networks according to equations (1) and (2) in a tree structure, which returns separate boolean outputs for equality and magnitude comparisons. Each of the



Fig. 1: Block diagram of an 8-bit version of the binary comparator architecture proposed in [19].

two comparison functions use parallel trees to test for  $A < B$  and  $A = B$ . To reduce the device count, the magnitude comparator tree uses alternating stages that produce true and inverted outputs for a 2-bit version of equation (1), as shown in Fig. 1. Similarly, the equality comparator uses the same principle of alternating stages but implements a different logic function using NAND and NOR operations to satisfy equation (2).

A different approach to the tree structure has been proposed in [20] to scale from 8-bit to 64-bit, leading to improvements in device count and a marginal decrease in power consumption over [19]. The architectures proposed in [19] and [20] are amongst the most efficient of the architectures in the literature while also having amongst the lowest device counts.

TABLE II: Device count and worst-case energy consumption estimation of the reviewed and proposed 64-bit binary comparator implementations.

| Architecture | Estimated Worst Energy ( $fJ$ ) | CMOS Count      | NEM Count      | Reduction Count |
|--------------|---------------------------------|-----------------|----------------|-----------------|
| [12]         | 1030                            | 1640            | 1624           | $\approx 1\%$   |
| [13]         | 1410                            | 3386            | 2580           | $\approx 24\%$  |
| [14]         | 2200                            | 964             | $\approx 416$  | $\approx 57\%$  |
| [16]         | 717                             | 1365            | 1365           | 0%              |
| [18]         | 194                             | $\approx 1700$  | $\approx 1700$ | 0%              |
| [19]         | 786                             | 1206            | 636            | $\approx 47\%$  |
| [20]         | 701                             | 1136            | 1052           | $\approx 7\%$   |
| Proposed     | 21                              | $N \setminus A$ | 636            | 0%              |

Table II shows a summary of the device count and a rough energy consumption estimation based on the worst-case power consumption and delay for 64-bit implementations of the different architectures. Results for [12], [13], [14], [16] and [19] were obtained from the values presented in [19] for the 180 nm process, the closest to the process node used for the 64-bit NEM implementation presented in this research. From this, results for other implementations were estimated based on their reported improvements over known architectures presented in [19]. It is important to note that the energy consumption estimation here only gives a means of comparison for the architecture, as not all the estimations take into account parameters such as process node. The implementation in [15] was omitted from Table II as their improvements in [16] provide reduced power consumption for an identical device count.

### III. NEM TECHNOLOGY

Electrostatically operated NEM relays have a beam that deflects under an applied electrostatic force to make electrical contact with a stationery electrode to define the on state [21]. The electrostatic force is due to a voltage applied between a control electrode, the gate, and the flexible beam. When the voltage is removed, the spring force in the deflected beam causes it to pull out of contact, acting against the adhesion forces between the surfaces in contact. Different types of relays have been proposed for different functions (e.g. [22]–[24]) and the most flexible type of switch to build logic circuits is one where the control voltage is decoupled from the data signal [25]. One embodiment of this type of switch is a 4-terminal relay as shown in Fig. 2. The control voltage is applied between the gate (G) and body (B) terminals, and the data signal flows between a second beam (S) and drain (D) where the two beams are structurally connected but electrically isolated. A Boolean expression describing the behaviour of this 4-T device is given in equation (3).

$$D = S \cdot (B \oplus G) \quad (3)$$

While different architectures have been proposed for 4-T relays [26]–[29], for this work we have implemented the in-plane switch architecture first described by Reynolds et al. [25] and later demonstrated in a foundry process [5] due to its relatively simple fabrication process with just two patterning steps, and its single moving contact, which helps with reliability [25]. We have modified this design to achieve a pull-in voltage under 10 V and fabricated it on an SOI substrate with device and buried oxide layer thicknesses of 1.5  $\mu\text{m}$  and 1  $\mu\text{m}$ , cantilever beam widths of 1.8  $\mu\text{m}$ , a straight hinge on the top (source) beam and serpentine hinge with one fold on the bottom (body) beam, both 1  $\mu\text{m}$  wide, and actuation and contact gaps of 400 nm and 300 nm (see Fig. 2b). First, a positive tone resist was deposited and patterned using e-beam lithography to define the switch structure, which was transferred to the silicon device layer through a dry etch. Next, a negative tone resist (AZ nLOF 2035) was deposited and patterned to define the plug, to serve as an insulating mechanical coupler between the two beams. The measured pull-in voltage for this device is 9 V with no detectable leakage current (see Fig. 2c). For the proof-of-concept prototyping carried out here, we did not use a special contact coating, and current flow occurred through the highly doped silicon device layer. This meant that the contact resistance at the tip was relatively high, and the number of hot switching cycles was limited. Thus, we used a drain bias of 10 V with a current limit of 10 nA to limit joule heating and resulting contact degradation.

### IV. NEM-BASED BINARY COMPARATOR

This section introduces a new class of NEM-based binary comparators that use a modular core circuit made up of two 4-T relays. We have fabricated and measured this circuit to show proof-of-concept for a single bit. We then show how a 64-bit comparator can be constructed by combining bit slices and verify its operation through simulations.



Fig. 2: (a) Schematic symbol of a 4-T NEM switch, (b) SEM of a fabricated 4-T device and (c) measurement of the source-drain current during actuation of the fabricated device.

#### A. Versatile Dual 4-T Relay Circuit

The generalised structure of our modular core circuit is shown in Fig. 3. While Reynolds et al. used this structure to demonstrate a demultiplexer by treating the *Out* pin as the input signal [25], this circuit implements the Boolean function in equation (4).

$$Out = (G \oplus B_1) \cdot S_1 + (G \oplus B_2) \cdot S_2 \quad (4)$$

By configuring the inputs  $G$ ,  $S_1$ ,  $S_2$ ,  $B_1$  and  $B_2$  differently, five distinct functions can be realised for 1, 2 and 3 inputs, inversion (INV), AND, MAG, EQ and multiplexing (MUX), as defined in table III, which have been derived by substituting  $A$ ,  $B$  and  $S$  (for select) in equation (4).

TABLE III: Pin mapping to produce 5 functions, INV, AND, MAG, EQ and MUX using the same dual 4-T circuit.

| Pin   | INV | AND | MAG | EQ             | MUX |
|-------|-----|-----|-----|----------------|-----|
| $S_1$ | 1   | $A$ | $A$ | 1              | $A$ |
| $B_1$ | 1   | 0   | 1   | $\overline{B}$ | 1   |
| $G$   | $A$ | $B$ | $B$ | $A$            | $S$ |
| $B_2$ | 0   | 1   | 0   | $B$            | 0   |
| $S_2$ | 0   | 0   | 0   | 0              | $B$ |

It is important to note that these pin arrangements ensure a strong pull-up or pull-down through a direct connection from *Out* to either  $S_1$  or  $S_2$  for all functions. Long sequential stages should be broken up and interspersed with buffers (pairs of inverters which can be constructed from the same modular 4-T gate or two 3-T relays) to regenerate the signal and restore sharp transitions. However, it should be noted that with NEM technology, slow rise and fall times do not result in



Fig. 3: (a) Schematic and (b) SEM image of the versatile dual 4-T structure.

increased power consumption as would be the case with CMOS. Transistors can be weakly on below the threshold voltage, but relays turn on and off abruptly at the pull-in and pull-out voltages.

### B. Test Results

Sample circuits of the dual 4-T structure were fabricated using the same process as the individual devices for proof of concept of logic functionality. The expected switch states for the two relays for MAG and EQ functions for different input combinations are shown in Table IV. The tests were performed by observing the flow of current through the source beams of the devices.

TABLE IV: Switch states of relays for equality and magnitude comparison.

| AB | MAG            |                | EQ             |                |
|----|----------------|----------------|----------------|----------------|
|    | R <sub>1</sub> | R <sub>2</sub> | R <sub>1</sub> | R <sub>2</sub> |
| 00 | ON             | OFF            | ON             | OFF            |
| 01 | OFF            | ON             | OFF            | ON             |
| 10 | ON             | OFF            | OFF            | ON             |
| 11 | OFF            | ON             | ON             | OFF            |

#### 1) Magnitude Comparator (MAG)

The first pattern tested was  $AB = 01$ , where the voltages on the common gate terminal ( $B$ ) and body terminal of R1 ( $V_{cc}$ ) were ramped from 0 to 10 V with a static bias of 10 V applied to Out, to be able to monitor any current flowing through any of the sources. This causes the relay R2 to pull in at  $\approx 9$  V. Although the lack of a contact material resulted in low source current, the pull-in event can clearly be seen in Fig. 4a. The source current of R1 remains zero, showing the circuit works correctly. Next, pattern  $AB = 11$  was tested by ramping  $B$  and  $V_{cc}$  back down to 0 and increasing to 10 V again, with  $A$  held at 10 V. Due to the poor silicon-to-silicon contact, R2 did not pull out but remained stuck. However, current does not flow through  $A$  to out, showing that R1 remains off as required, but current does flow from out to ground via R2, providing the required logic '0'. After some time, R2 did pull out, and the next test, for  $AB = 10$ , was performed by ramping  $V_{cc}$  and  $A$  from 0 to 10 V with  $B$  and Out grounded. For this test, the pull-in event was not recorded, but the correct functionality can be observed, as a current flows from  $A$  to Out, providing the required logic '1'. After this test,  $A$  and  $V_{cc}$  were ramped down, and  $V_{cc}$  ramped from 0 to 10 V with  $A$  grounded and a static bias of 10 V at Out to test the  $AB = 00$  pattern. R1

remained stuck so pull-out and subsequent pull-in could not be observed, but current flows from Out to  $A$ , giving the required logic '0'.



Fig. 4: Current measured at the source and out pins of a fabricated 4-T modular core circuit configured as a magnitude comparator for different input patterns: (a)  $AB = 01$  (b)  $AB = 11$  (c)  $AB = 10$  and (d)  $AB = 00$ .

#### 2) Equality Comparator (EQ)



Fig. 5: Current measured at the sources and output (y-axis) of a fabricated 4T modular core circuit arranged as an equality comparator, with (a)  $AB = 01$  (b)  $AB = 11$  (c)  $AB = 10$  and (d)  $AB = 00$ . Voltages being ramped during measurements are labeled on the x-axis.

The first pattern tested was  $AB = 01$  where  $B$  was ramped from 0 to 10 V with a static 10 V bias voltage applied to Out, and  $A$  and  $\bar{B}$  grounded as shown in figure 5a. Here too, the pull-in event could not be recorded due to the beam getting stuck in a preliminary test, but current can be seen to flow between Out and ground through R2, which provides the correct logic '0' at the output. To verify that R1 remained switched off, the node labelled  $V_{cc}$  was grounded, and no current could be seen to flow in to that node from Out. Some minutes later, R2 had pulled out, and the test for  $AB = 00$  was performed by ramping  $\bar{B}$  with a static 10 V bias applied to Out, and  $A$  and  $B$  grounded. To check that relay R1 had turned on,  $V_{cc}$  was grounded, and current is seen to flow from Out to ground via relay R1 for correct functionality while no current was observed

flowing into ground via the source of R2. In actual operation Vcc would be logic high, which would be transmitted to Out via R1, but for testing purposes, it was more convenient to ground Vcc and apply a bias to Out as it could also be verified that R2 remained off. Next, the  $AB = 10$  pattern was tested by ramping A and  $\bar{B}$  from 0 to 10 V with a static 10 V bias voltage applied to Out. Here, the pull-in event for R2 was recorded at 9.9 V, though with a current that reached 0.5 nA. Nevertheless, correct functionality is observed as a logic '0' is presented at Out via R2, and no current flows between Out and ground via R1. Finally, the  $AB = 11$  pattern was tested by ramping A and B from 0 to 10 V with a static 10 V bias voltage applied to Out. As with the  $AB = 00$  pattern, the bias was applied to Out rather than Vcc in order to simultaneously test that R1 was on and R2 remained off. This was verified by the current measurements.

### C. 64-bit NEM Binary Comparator Design

Here, we investigate how a 64-bit binary comparator can be constructed using NEM technology. For this work, we have estimated the NEM device count for each of the architectures introduced in section II, and collated the results in column 3 of Table II.

#### a) Priority-Encoding-based Comparator

Both of the priority-encoder-based architectures proposed in [12] and [13] have the two highest device counts amongst the considered architectures. After optimizing for a NEM implementation, the device count of [12] was lowered by less than 1%. On the other hand, [13] provided more optimization opportunities, with a device count reduction of 24%. The total devices still numbered 2580, due to the high initial count.

#### b) BCL and CLA based Comparators

The implementations proposed in [14], [15], [16] and [18] all base their designs on a dynamic style logic which is not well suited to NEM technology due to the requirements for periodic refresh and the resulting high switching count, given that the number of hot switch cycles in NEM switches is likely to be low compared to transistors. Additionally, the redundant logic for dual-mode operation in [18] also leads to an increase in device count. Estimated device counts for these architectures are provided in Table II.

#### c) Non-CLA Tree-based Comparators

The single-bit binary comparator constituting the pre-encode stage proposed in [19], while already well optimized, still provides an obvious optimization opportunity when implemented in NEM technology. Replacing the equality and magnitude comparators by the NEM optimized circuit reduces the device count from 9 to 6 leading to a substantial decrease of 192 devices in the 64-bit implementation. While the magnitude comparator coupling circuits of [19] provide no obvious optimization opportunities, the Boolean expression of equation 1 can be manipulated to reduce the device count. The coupling stage presented in [19] implements the boolean expression

$$G_{[1:0]} = G_1 + EQ_1 \cdot G_0 \quad (5)$$

where  $G_1$  corresponds to  $A_1 < B_1$ ,  $EQ_1$  to an equality at the most significant bit and  $G_0$  to  $A_0 < B_0$  at the next bit

position (least significant bit here). For each comparison, three mutually exclusive states exist: greater than (G), equal to (EQ) or less than (L). From this it is possible to deduce that  $G_1 \equiv G_1 \cdot \overline{EQ}_1$  as the  $\overline{EQ}_1$  cannot equate to a logical '0' when  $G_1$  is a logical '1', allowing equation (5) to be expanded to equation (6) following the format of a MUX equation.

$$G_{[1:0]} = \overline{EQ}_1 \cdot G_1 + EQ_1 \cdot G_0 \quad (6)$$

Furthermore, as the MUX does not need inverted inputs as required by the CMOS coupling stages proposed in [19], a MUX-based coupling logic circuit can be used as a replacement to both of the alternating magnitude coupling stages. The same applies to the equality comparison, requiring only an AND gate to merge all the single-bit equality comparator outputs. These improvements in the pre-encode stage as well as the coupling stage of [19] allows a reduction of 47% in device count making it the second lowest device count of the NEM implementations behind the BCL-based comparator.

#### d) 64-bit NEM-based Binary Comparator

As summarised in Table II, an in-depth study of CMOS implementations suitable for NEM optimization reveals that modifications to the design proposed in [19] gives the lowest device count (excepting dynamic circuits) making it the preferred approach to adopt in the design of a 64-bit NEM-based comparator. The proposed pre-encode follows the same structure proposed in [19], relying on a single-bit binary comparator comprising of a single-bit magnitude and equality comparator as well as an inverter on the input B for the equality comparison. This implementation differs from [19] in that the magnitude comparator computes  $A > B$  rather than  $A < B$ . As previously mentioned, these operations are logically equivalent and obtained by swapping inputs for A and B.

Fig. 6a shows a schematic of the NEM-based single-bit binary comparator serving as a pre-encode stage, with  $i$  the bit position in a binary number. Based on equation (6), the Boolean function of the coupling network of [19] can be reformulated in the form of a MUX-based equation. It can then be determined that the logic function carried out by the two variants of the 6-transistor coupling network can be implemented by a MUX (one MUX each per variant). As the dual 4-T gate is a very efficient implementation of a MUX, the tree structure of [19] can be implemented with a saving of four devices per coupling circuit. This leads to a  $3 \times$  reduction of device count for the magnitude comparator coupling network. Additionally, implementing the equality comparator coupling network using 4-T relay-based AND gates reduces the device count by half when compared to its CMOS implementation proposed in [19]. Fig. 6b shows the schematic of the coupling circuit for both the magnitude comparator and the equality comparator, corresponding to the top and bottom halves of the schematic respectively.

A 4-bit example of the full architecture is shown in Fig. 7. This can be expanded to 64 bits in a straightforward way.

A simulation of the 64-bit NEM-based architecture was carried out using accurate models of the NEM devices based on order reduction of finite-element models (after [6], [21],



Fig. 6: Schematic of the (a) pre-encode circuit and (b) coupling circuit of the NEM-based binary comparator.



Fig. 7: Block diagram of a 4-bit NEM binary comparator following the proposed architecture.

[30]) and is presented in Fig. 8. The simulation parameters were chosen to highlight the input bit patterns that result in the highest static and dynamic power consumption as well as the slowest propagation delay for both comparison operations.



Fig. 8: Simulation output of the slowest case for both comparison operation as well as the highest dynamic and static powers for the 64-bit NEM-based binary comparator.

The input combination that gives the slowest equality comparison is obtained when B<sub>[63:0]</sub> starts at its maximum value and transitions to 0 while A<sub>[63:0]</sub> stays at 0. This pattern results in the longest path of devices actuated in series, for a critical path propagation delay of 2.66 μs. On the other hand, the slowest magnitude comparison occurs when transitioning from B<sub>[63:0]</sub> = 0x800000008000808B to 0 while A<sub>[63:0]</sub> remains at 1, taking approximately 2.53 μs for the operation, making it the quicker of the two operations. The highest dynamic power consumption is obtained when all bits of B<sub>[63:0]</sub>

transition from a 0 to a 1 while A<sub>[63:0]</sub> stays at 0, consuming 7.90 nW for the operation executed in around 1 μs. Table II shows the worst-case energy consumption for a single operation from the simulated architecture and calculated from the worst-case delay and worst-case power consumption. The worst-case energy consumption for a single operation is found to be 21 fJ, which is less than 1/37<sup>th</sup> of its CMOS equivalent and 1/9<sup>th</sup> of the energy consumption of the lowest of all reviewed CMOS implementations. This saving is the result of the lower device count as well as a significantly lower effective switching activity factor enabled by the attributes of the 4-T relay, where primary inputs can be connected directly to all four nodes without a threshold drop. A single 4-T device based on the fabricated prototype design is found to occupy an area of ≈ 373 μm<sup>2</sup>. Extrapolating from this, the device footprint of the 64-bit NEM-based binary comparator is calculated to be of ≈ 0.237 mm<sup>2</sup> compared to ≈ 0.450 mm<sup>2</sup> without the optimisation that resulted in device count improvements.

## V. CONCLUSIONS AND DISCUSSION

In this paper we proposed a fully mechanical binary comparator, which can be used in edge applications that require ultra-low power consumption, including environments with high temperatures and radiation levels. The core of the comparator is a dual 4-T circuit that is versatile and can be used to generate five distinct boolean functions by configuring the inputs differently, including magnitude and equality functions that are at the heart of binary comparators, and multiplexing, essential for combining multiple stages when scaling up the circuit to higher bit widths. An in-depth investigation was carried out to identify the best architecture for constructing binary comparators for higher bit widths. Through optimization that utilised the circuit efficiencies afforded by NEM technology, a 47% reduction in device count was achieved for this architecture compared to its CMOS implementation. A 64-bit binary comparator was constructed and simulated using accurate device models for the relays to extract the worst-case values for propagation delay, 2.66 μs, and dynamic power consumption, 7.90 nW, giving a worst-case energy consumption of around 21 fJ.

Notably, prototypes of the core dual 4-T circuit were fabricated on silicon-on-insulator substrates and tested for implementation of the magnitude and equality functions. While the prototype circuits had limited switch life due to the lack of a contact coating, measurements were obtained to provide proof of concept of both the magnitude and equality functions, showing the potential to realise fully mechanical implementations. For example, carbon-based contact coatings have shown great promise in miniaturised switches [22], [23] while Ruthenium is used to obtain billions of cycles in micro-scale relays [31]. The total die area occupied by a 64-bit comparator based on a conservative design used for prototyping was ≈ 0.237 mm<sup>2</sup>, which could be significantly reduced by scaling the device (e.g., see [5]). Edge applications often have stringent environmental and energy constraints that cannot be readily met by existing electronic solutions. This work has shown the potential of nanomechanical technology as a solution.

## REFERENCES

- [1] M. Muzelak and T. Skovranek, "Edge computing implementation of safety monitoring system in frame of IIoT," in *Proc. IEEE Int. Carpathian Control Conf. (ICCC)*, 2022, pp. 125–129.
- [2] R. S. Somesula, R. Joshi, and S. Katkoori, "On Feasibility of Decision Trees for Edge Intelligence in Highly Constrained Internet-of-Things (IoT)," in *Proc. Great Lakes Symp. VLSI*. Knoxville, TN, USA: ACM, Jun. 2023, pp. 217–218.
- [3] D. Lu, G. Gao, Y. Shen, and Z. Tong, "Design of gearbox monitoring system based on edge computing," in *Proc. Int. Conf. Artificial Intelligence and Advanced Manufacturing (AIAM)*, 2022, pp. 355–360.
- [4] W. Fang, S.-S. Li, C.-L. Cheng, C.-I. Chang, W.-C. Chen, Y.-C. Liu, M.-H. Tsai, and C. Sun, "CMOS MEMS: A key technology towards the ‘more than moore’ era," in *Proc. Int. Conf. Solid-State Sensors, Actuators and Microsystems (TRANSDUCERS & EUROSENSORS XXVII)*, 2013, pp. 2513–2518.
- [5] Y. Li, E. Worsey, S. Bleiker, P. Edinger, M. K. Kulsreshath, Q. Tang, A. Y. Takabayashi, N. Quack, P. Verheyen, W. Bogaerts, K. B. Gylfason, D. Pamunuwa, and F. Niklaus, "Integrated 4-terminal single-contact nanoelectromechanical relays implemented in a silicon-on-insulator foundry process," *Nanoscale*, vol. 15, no. 43, pp. 17 335–17 341, Oct. 2023.
- [6] T. Qin, S. Rana, and D. Pamunuwa, "Design methodologies, models and tools for very-large-scale integration of nem relay-based circuits," in *Proc. IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD)*. Austin, TX, USA: IEEE/ACM, 2015, pp. 641–648.
- [7] Z. Ullah, "Lh-cam: Logic-based higher performance binary cam architecture on fpga," *IEEE Embedded Systems Letters*, vol. 9, no. 2, pp. 29–32, Feb. 2017.
- [8] S. J. Piestrak, "Efficient hamming weight comparators of binary vectors," *Electronics Letters*, vol. 43, no. 11, pp. 611–612, May 2007.
- [9] M. Shabany and P. G. Gulak, "A 0.13 $\mu$ m cmos 655mb/s 4x4 64-qam k-best mimo detector," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*. San Francisco, CA, USA: IEEE, Feb. 2009, pp. 256–257,257a.
- [10] A. Wheeldon, R. Shafik, T. Rahman, J. Lei, A. Yakovlev, and O.-C. Granmo, "Learning automata based energy-efficient AI hardware design for IoT applications," *Philosophical Trans. Royal Society a*, vol. 378, no. 2182, p. 20190593, Sep. 2020.
- [11] S. B. Akers, "Binary decision diagrams," *IEEE Trans. Computers*, vol. C-27, no. 6, pp. 509–516, Jun. 1978.
- [12] C.-H. H. Huang and J.-S. Wang, "High-performance and power-efficient cmos comparators," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 2, pp. 254–262, Feb. 2003.
- [13] H.-M. Lam and C.-Y. Tsui, "A mux-based high-performance single-cycle cmos comparator," *IEEE Trans. Circuits and Systems II: Express Briefs*, vol. 54, no. 7, pp. 591–595, Aug. 2007.
- [14] J.-Y. Kim and H.-J. Yoo, "Bitwise competition logic for compact digital comparator," in *Proc. IEEE Asian Solid-State Circuits Conf. (ASSCC)*. Jeju, Korea: IEEE, Nov. 2007, pp. 59–62.
- [15] S. Perri and P. Corsonello, "Fast low-cost implementation of single-clock-cycle binary comparator," *IEEE Trans. Circuits and Systems II: Express Briefs*, vol. 55, no. 12, pp. 1239–1243, Dec. 2008.
- [16] F. Frustaci, S. Perri, M. Lanuzza, and P. Corsonello, "A new low-power high-speed single-clock-cycle binary comparator," in *Proc. IEEE Int. Symp. Circuits and Systems (ISCAS)*. Paris, France: IEEE, Jun. 2010, pp. 317–320.
- [17] A. Kaizerman, S. Fisher, and A. Fish, "Subthreshold dual mode logic," *IEEE Trans. Very Large Scale Integration (VLSI) Systems*, vol. 21, no. 5, pp. 979–983, 2013.
- [18] R. Escobar, L. M. Prócel, L. Trojman, M. Lanuzza, and R. Taco, "High-speed and low-energy dual-mode logic based single-clock-cycle binary comparator," in *Proc. IEEE Latin America Symp. Circuits and System (LASCAS)*. Arequipa, Peru: IEEE, Feb. 2021, pp. 1–4.
- [19] P. Chuang, D. Li, and M. Sachdev, "A low-power high-performance single-cycle tree-based 64-bit binary comparator," *IEEE Trans. Circuits and Systems II: Express Briefs*, vol. 59, no. 2, pp. 108–112, Jan. 2012.
- [20] Anjuli and S. Anand, "High-performance 64-bit binary comparator," in *Proc. Int. Conf. Reliability Optimization and Information Technology (ICROIT)*. Faridabad India: IEEE, Feb. 2014, pp. 512–519.
- [21] S. Rana, T. Qin, A. Bazigos, D. Grogg, M. Despont, C. L. Ayala, C. Hagleitner, A. M. Ionescu, R. Canegallo, and D. Pamunuwa, "Energy and latency optimization in nem relay-based digital circuits," *IEEE Trans. Circuits and Systems I: Regular Papers*, vol. 61, no. 8, pp. 2348–2359, Apr. 2014.
- [22] D. Grogg, C. L. Ayala, U. Drechsler, A. Sebastian, W. W. Koelmans, S. J. Bleiker, M. Fernandez-Bolanos, C. Hagleitner, M. Despont, and U. T. Duerig, "Amorphous carbon active contact layer for reliable nanoelectromechanical switches," in *Proc. IEEE Int. Conf. Micro Electro Mechanical Systems (MEMS)*. San Francisco, CA, USA: IEEE, Jan. 2014, pp. 143–146.
- [23] S. Rana, J. D. Reynolds, T. Y. Ling, M. S. Shamsudin, S. H. Pu, H. M. Chong, and D. Pamunuwa, "Nano-crystalline graphite for reliability improvement in MEM relay contacts," *Carbon*, vol. 133, pp. 193–199, Mar. 2018.
- [24] S. Rana, J. Mouro, S. J. Bleiker, J. D. Reynolds, H. M. Chong, F. Niklaus, and D. Pamunuwa, "Nanoelectromechanical relay without pull-in instability for high-temperature non-volatile memory," *Nature Communications*, vol. 11, p. 1181, Mar. 2020.
- [25] J. D. Reynolds, S. Rana, E. Worsey, Q. Tang, M. K. Kulsreshath, H. M. Chong, and D. Pamunuwa, "Single-contact, four-terminal microelectromechanical relay for efficient digital logic," *Advanced Electronic Materials*, vol. 9, no. 1, p. 2200584, Sep. 2022.
- [26] R. Nathanael, V. Pott, H. Kam, J. Jeon, and T.-J. K. Liu, "4-terminal relay technology for complementary logic," in *IEEE Int. Electron Devices Meeting (IEDM)*, IEEE. Baltimore, MD, USA: IEEE, Dec. 2009, pp. 1–4.
- [27] J. Jeon, V. Pott, H. Kam, R. Nathanael, E. Alon, and T.-J. K. Liu, "Perfectly complementary relay design for digital logic applications," *IEEE Electron Device Letters*, vol. 31, no. 4, pp. 371–373, 2010.
- [28] D. Lee, W. S. Lee, C. Chen, F. Fallah, J. Provine, S. Chong, J. Watkins, R. T. Howe, H.-S. P. Wong, and S. Mitra, "Combinational logic design using six-terminal nem relays," *IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems*, vol. 32, no. 5, pp. 653–666, 2013.
- [29] Y.-H. Yoon, Y. Jin, C.-K. Kim, S. Hong, and J.-B. Yoon, "A low contact resistance 4-terminal mems relay: Theoretical analysis, design, and demonstration," *Journal of Microelectromechanical Systems*, vol. 27, no. 3, pp. 497–505, 2018.
- [30] S. Rana, T. Qin, D. Grogg, M. Despont, Y. Pu, C. Hagleitner, and D. Pamunuwa, "Modelling nem relays for digital circuit applications," in *Proc. IEEE Int. Symp. Circuits and Systems (ISCAS)*. Beijing, China: IEEE, 2013, pp. 805–808.
- [31] M. Walker, C. Nordquist, D. Czaplewski, G. Patrizi, N. Megruer, and J. Krim, "Impact of in situ oxygen plasma cleaning on the resistance of ru and au-ru based rf microelectromechanical system contacts in vacuum," *Journal of Applied Physics*, vol. 107, no. 8, pp. 084 509 – 084 509, Apr. 2010.