

# Efficient Ternary Logic Circuits Optimized by Ternary Arithmetic Algorithms

Guangchao Zhao , Zhiwei Zeng, Xingli Wang , Abdelrahman G. Qoutb , Philippe Coquet, Eby G. Friedman , Fellow, IEEE, Beng Kang Tay , Senior Member, IEEE, and Mingqiang Huang 

**Abstract**—Multi-valued logic (MVL) circuits, especially the ternary logic circuits, have attracted great attention in recent years due to their higher information density than binary logic systems. However, the basic construction method for MVL circuit standard cells and the CMOS fabrication possibility/compatibility issues are still to be addressed. In this work, we propose various ternary arithmetic circuits (adders and multipliers) with embedded ternary arithmetic algorithms to improve the efficiency. First, ternary cycling gates are designed to optimize both the arithmetic algorithms and logic circuits of ternary adders. Second, optimized ternary Boolean truth table is used to simplify the circuit complexity. Third, high-speed ternary Wallace tree multipliers are implemented with task dividing policy. Significant improvements in propagation delay and power-delay-product (PDP) have been achieved as compared with previous works. In particular, the ternary full adder shows 11 aJ PDP at 0.5 GHz, which is the best result among all the reported works using the same simulation platform. And an average PDP improvement of 36.8% in the ternary multiplier is also achieved. Furthermore, the proposed methods have been successfully explored using standard CMOS 180nm silicon devices, indicating its great

potential for the practical application of ternary computing in the near future.

**Index Terms**—Multi-valued logic, ternary arithmetic circuits, ternary adders, ternary multipliers, CMOS based ternary logic.

## I. INTRODUCTION

WITH the increasing applications of artificial intelligence (AI), Internet of Things (IoT) and autopilot, a tremendous amount of data is being created which needs to be processed. According to predictions, the total data generated every year will rise from 64.2 zettabytes ( $10^{21}$ ) in 2020 to over 180 zettabytes in 2025 [1], [2]. Therefore, computers with higher data density are highly desired to meet the future requirements for data processing.

In another aspect, with the coming end of Moore's law, it becomes significantly more difficult to improve chip performance by simply shrinking the transistor feature size [3]. Nearly 70% of the chip area and over 50% of the dynamic power consumption in digital circuits are attributed to the interconnections [4]. New strategies for “more-than-Moore” are highly desired as a platform for future technologies supporting the big-data era.

Multi-valued logic (logic radix  $r > 2$ ) inherently contains a higher number of logic states as compared to classical binary logic (0, 1); therefore, it is possible to achieve greater data density and computing capability. The circuit complexity of a logic system can be expressed as  $C(r) = k * r^d$ , where  $k$  is a constant,  $r$  is the logic radix, and  $d$  is the digit number required to express a certain number  $N$  ( $N = r^d$ ). By considering the data density and the circuit complexity, the efficiency of a digital system can be further formulated as  $E(r) = N/(k * r^d)$ . Among all the integers, 3 is closest to the optimal logic radix  $e$ , thus the ternary digital system is believed to carry a higher data density, faster data processing speed, and lower power consumption with reduced circuit complexity as compared to traditional binary logic [5], [6] and other multi-value logic, which make ternary logic devices a promising candidate for future electronics.

A series of works have been reported to realize ternary logic, including complementary metal-oxide-semiconductor (CMOS) transistors with resistors [7], memristors [8], 2D semiconductor heterojunctions [9], [10], quantum dots [11], negative capacitance devices [12], [13], carbon nano-tube field-effect transistor (CNTFET) [14], [15]. In consideration of the comprehensive performance including speed, power, and

Manuscript received 11 February 2023; revised 9 August 2023; accepted 26 September 2023. Date of publication 19 October 2023; date of current version 6 September 2024. This work was supported by the Ministry of Education, Singapore under Grant AcRF TIER 2- MOE2019-T2-2-075, in part by the Shenzhen Science and Technology Innovation Committee under Grant JCYJ20200109115210307, in part by the National Natural Science Foundation of China under Grant 62106254, and in part by STI 2030-Major Projects under Grant 2022ZZD0210600. (*Guangchao Zhao and Zhiwei Zeng contributed equally to this work.*) (*Corresponding authors:* Beng Kang Tay; Mingqiang Huang.)

Guangchao Zhao and Beng Kang Tay are with the Centre for Micro- and Nano-Electronics (CMNE), School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, and also with the CNRS-International-NTU-THALES-Research-Alliance (CINTRA, UMI 3288), Nanyang Technological University, Singapore 639798 (e-mail: zhao0344@e.ntu.edu.sg; ebktay@ntu.edu.sg).

Xingli Wang is with the CNRS-International-NTU-THALES-Research-Alliance (CINTRA, UMI 3288), Nanyang Technological University, Singapore 639798 (e-mail: wangxingli@ntu.edu.sg).

Zhiwei Zeng and Mingqiang Huang are with the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China (e-mail: zw.zeng@siat.ac.cn; mq.huang2@siat.ac.cn).

Abdelrahman G. Qoutb and Eby G. Friedman are with the Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627 USA (e-mail: a.qoutb@rochester.edu; friedman@ece.rochester.edu).

Philippe Coquet is with the CNRS-International-NTU-THALES-Research-Alliance (CINTRA, UMI 3288), Nanyang Technological University, Singapore 639798, and also with the Institut d'Electronique, de Microélectronique et de Nanotechnologie (IEMN), CNRS UMR 8520, Université de Lille, 59000 Lille, France (e-mail: philippe.coquet@cnrs.fr).

Digital Object Identifier 10.1109/TETC.2023.3321050

tunability, the CNTFET is believed to be the promising candidate due to its ballistic carrier transport property, convenient control of the threshold voltage ( $V_{th}$ ), and complete logic synthesis methods of ternary logic functions [15], [16]. So far, almost all of the ternary circuits are constructed and simulated using CNTFETs SPICE models [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], though they suffer the practical fabrication incompatibility issue [28].

Over the past years, tremendous efforts have been devoted to developing ternary adders to achieve efficient data processing [17], [18]. However, conventional ternary full adder suffers from high circuit complexity, which offsets the aforementioned advantages of ternary logic. Continuous improvements have been made to reduce the transistor count, power consumption, and delay of ternary adders. But still, most full adders suffer from either a large transistor count or a high power-delay-product (PDP). For example, S. Firouzi et al. [19] presented a capacitive-based full adder design with low transistor count, but highly susceptible to variations and requiring a large chip area due to the usage of capacitors. S. Kim et al. [15] presented a complete logic synthesis method of ternary logic, developed full adders with low PDP based on CNTFETs, and benchmarked the development. Nevertheless, over 100 transistors are needed, which is almost thrice binary full adders.

In this paper, we propose high performance and efficient ternary arithmetical circuits including ternary adders. Then we apply the adders into the realization of efficient ternary multipliers. Various ternary arithmetic algorithms are utilized to optimize the circuits. We develop novel balanced and unbalanced ternary adders based on ternary cycling gates, namely ternary incremental cycling gate (TIC) and ternary decremental cycling gate (TDC). A reduced transistor count and a lower PDP are achieved. The logic synthesis method based on multi-threshold voltage MOSFETs [15] is adopted to realize the monadic and diadic logic functions introduced in our design. Furthermore, we present the design of high-speed ternary Wallace tree multipliers based on the proposed ternary half/full adders. All of the proposed ternary arithmetic circuits are verified and tested in HSPICE simulation, which show better performance regarding transistor count and PDP as compared with reported works. The same compact model is used to provide a fair performance comparison. The main contributions of this paper include:

- Novel unbalanced and balanced ternary half/full adders are proposed based on ternary cycling gates to reduce the transistor count and PDP. Our proposed unbalanced ternary full adder requires only 93 transistors, the lowest to date among the reported purely MOSFET-based full adders. The PDP of the unbalanced and balanced ternary full adders achieve an average improvement of 16.1% and 22.8% as compared with the method discussed in [15], respectively.
- Ternary Wallace tree multipliers are designed to realize high-efficient ternary multiplication. Owing to the usage of the proposed high-performance ternary adders, a higher speed and a lower PDP have been achieved in the multipliers as compared with traditional multipliers. Compared

with traditional array multipliers, an average improvement of 17.5% and 36.8% in PDP are achieved in balanced and unbalanced ternary Wallace ternary multipliers, respectively. Additionally, a PDP of 23.77 fJ in our unbalanced ternary multiplier and a PDP of 18.30 fJ in our balanced ternary multiplier are the lowest to date among the reported works.

- The proposed methods have been successfully explored using the standard CMOS 180nm silicon device. In particular, the proposed silicon ternary adder can reach a high speed of about 1GHz. To the best of all knowledge, this is the first ternary logic circuit that shows comparable performance with binary circuit, indicating its great potential for the practical application in the future.

## II. BACKGROUND AND RELATED WORKS

### A. Unbalanced and Balanced Ternary Representations

Ternary logic systems can be represented in two ways: unbalanced ternary logic (UBT: 0, 1, 2), and balanced ternary logic (BT: -1, 0, 1 or T, 0, 1). The balanced ternary number can be encoded for both positive and negative numbers with

$$Y = \sum_{i=0}^{n-1} (y_i \cdot 3^i)$$

where  $y_i$  is the ternary digit (trit) value that can be -1 (or T), 0 or 1, and  $3^i$  represents the base-3 weighting [29], [30].

The coding of unbalanced ternary logic is relatively difficult as it does not contain a negative weighting. To represent the negative numbers, a ternary complement coding scheme (similar to the binary system) can be used. The most significant digit will be used as the sign trit, and 0 represents positive, and NOT 0 (namely 1 or 2) represents the negative value. The formula to convert a decimal to an unbalanced ternary n-trit number is

$$Y = -y_{n-1} \cdot 3^{n-1} + \sum_{i=0}^{n-2} (y_i \cdot 3^i)$$

where  $y_i$  is the ternary trit value that can be 0, 1 or 2, and  $3^i$  represents the base-3 weighting. Table I shows the ternary representations for different decimal numbers as case studies. It is worth mentioning that despite the difference between the unbalance/balanced ternary representations, both can be electrically represented by (0, VDD/2, VDD).

### B. Ternary Logic Functions and Logic Circuits

Before building up to larger scale CMOS ternary arithmetic units, the necessary background of the ternary logic primitives is presented in this section. In conventional Boolean logic, there is a total of four ( $2^2$ ) monadic functions. The single nontrivial function is NOT, which inverts its input, returning false when given true and vice versa. The move to ternary increases the number of monadic functions from 4 to 27 ( $3^3$ ) which have been enumerated in Table II. Among the 27 monadic functions, three are nontrivial, namely the standard ternary

**TABLE I**  
TERNARY REPRESENTATIONS

| Decimal | Binary Code<br>one's<br>Complement | Binary Code<br>two's<br>Complement | Unbalanced<br>Ternary (UBT)<br>(0 1 2) | Balanced<br>Ternary (BT)<br>(T 0 1) |
|---------|------------------------------------|------------------------------------|----------------------------------------|-------------------------------------|
| 5       | 00101                              | 00101                              | 012                                    | 1TT                                 |
| 1       | 00001                              | 00001                              | 001                                    | 001                                 |
| 0       | 00000                              | 00000                              | 000                                    | 000                                 |
| -1      | 10001                              | 11111                              | 122                                    | 00T                                 |
| -5      | 10101                              | 11011                              | 111                                    | T11                                 |
| -9      | 11001                              | 10111                              | 100                                    | T00                                 |
| -10     | 11011                              | 10101                              | 222                                    | T0T                                 |
| -11     | 11100                              | 10100                              | 221                                    | TT1                                 |

**TABLE II**  
TERNARY MONADIC FUNCTIONS



inverter (STI), negative ternary inverter (NTI), and positive ternary inverter (PTI). NTI and PTI can be considered ternary to binary conversion functions because they have only two-valued outputs.

In most previous works, ternary logic circuits are designed by using only three mentioned inverters [14], while all of the other ternary functions are ignored, which makes the ternary circuits suffer from either large transistor count or high PDP. In the following part, our ternary logic circuits are designed with the optimized ternary arithmetic algorithms by utilizing more ternary functions.

### III. UNBALANCED TERNARY ADDERS

Similar to a binary full adder, a ternary full adder can be divided into two separate addition stages sequentially, in which the first stage performs the addition of two inputs, and the second stage performs the addition of the output Sum signal of the first



Figure 1. Proposed unbalanced ternary full adder, which is divided into two half adders.

stage addition and the third input signal. Figure 1 shows the proposed ternary full adder with two stages additions. Each stage can be implemented with a ternary half adder, namely the first stage ternary half adder (HA) and the second stage half adder (HA').

HA performs the addition of inputs A and B, which generates the summation signal  $\text{Sum}_{AB}$  and the carry signal  $\text{Carry}_{AB}$ . HA' afterwards performs the addition of  $\text{Sum}_{AB}$  and the input carry signal  $C_{in}$ , which generates the summation signal  $\text{Sum}_{ABC}$  (the final summation result Sum) and the carry signal  $\text{Carry}_{ABC}$ . The output carry signal  $C_{out}$  can be generated using a carry signal generation gate (CarryCom gate in Figure 1), whose inputs are the carry output signals of the two half adders.

#### A. Design of the Unbalanced Ternary First Stage Half Adder

The truth table and the design of the HA are shown in Figure 2(a). The HA is further divided into a summation signal generation gate (Sum) and a carry signal generation gate (Carry).

For the HA, the inputs are two ternary numbers A and B, and the outputs are the summation signal (Sum) and carry signal (Carry). Note that Carry will only be 0 or 1. For example, when  $A = 1$  (2),  $B = 2$  (1), the original  $A+B = 3$ , such that  $\text{Sum} = 0$  and  $\text{Carry} = 1$ . When  $A = 2$  and  $B = 2$ , the original  $A+B = 4$ , such that  $\text{Sum} = 1$  and  $\text{Carry} = 1$ . Carry is 0 in the other input situations.

Utilizing the logic synthesis method reported in [15], the Carry gate of the HA was originally realized as shown in Figure 2(b). However, the pass transistors are not necessarily needed due to the elimination of the VDD pull-up network and the half VDD pull-down network. Hence, we propose a half VDD method to eliminate the pass transistors, so that the circuit complexity can be reduced and the delay and power can be optimized. The transformation of the Carry signal from the original design to our optimized design (new Carry) is shown in the truth table in Figure 2(a). The new Carry signals are either in the high or low logic state under the half VDD scheme, and the pass transistors used to generate the middle logic state can be eliminated. The sum of product (SOP) of the Carry gate under the half VDD scheme is: half VDD/GND pull-up network  $F = (A_1 + A_2) * B_2 + A_2 * (B_1 + B_2)$ , and half VDD/GND pull-down network  $F = A_0 + (A_0 + A_1) * (B_0 + B_1) + B_0$ .

A transistor-level schematic of the optimized Carry gate is presented in Figure 2(c). As compared with the original design, fewer transistors are required. Additionally, the PDP of the new



**Figure 2.** (a) Schematic and truth table of the first stage unbalanced ternary half adder. (b) Schematic of original Carry gate and (c) Carry gate with optimized ternary Boolean truth table from “logic 1” to “logic 2”. VDD is set to be 0.9 V and the white and stripe pattern transistor symbols represent MOSFETs with  $V_{th}$  values of 0.323 V and 0.428 V, respectively.

Carry gate is 0.499 aJ, while the PDP of the original design is 0.844 aJ. An improvement of 40.88% in PDP is achieved in the proposed design.

Before interpreting the Sum gate, we elaborate on the two kinds of ternary cycling gates. The cycling function only exists in higher radix logics, and it degenerates into the inversion function in binary logic. Ternary cycling gates are among the 27 (33) kinds of ternary monadic functions [20]. There are two kinds of ternary cycling gates: TIC and TDC. TIC can be expressed as

$$Y = (x + 1) \bmod 3$$

and it performs a plus 1 operation of the input signal (output = input + 1), while TDC can be expressed as

$$Y = (x - 1) \bmod 3$$

and it performs a minus 1 operation of the input signal (output = input - 1), which also equals the plus 2 operation (output = input + 2) in the case of unbalanced ternary logic.

The TIC and the TDC are realized utilizing similar ternary logic synthesis method in [15]. This method realizes arbitrary ternary logic functions by dividing the circuit into four blocks, namely VDD pull-up and pull-down networks and half VDD pull-up and pull-down networks. In each network,

multi-threshold voltage MOSFETs are used to collocate the switch operations and generate the output according to the truth table. In the TIC and the TDC, CNTFET with three kinds of threshold voltages are used,  $\pm 0.323$  V,  $\pm 0.428$  V and  $\pm 0.678$  V, respectively.

Specifically, the n-type CNTFET with  $+0.323$  V and  $+0.678$  V threshold voltages are used to collocate different switch operations in the VDD and half VDD pull-down networks; The p-type CNTFET with  $-0.323$  V and  $-0.678$  V threshold voltages are used to collocate different switch operations in the VDD and half VDD pull-up networks; The CNTFETs with  $\pm 0.428$  V threshold voltage, close to half VDD, are used as the pass transistor to generate a stable middle state at the output terminal.

As shown in the Sum gate in Figure 3(a), the augend signal B is transformed into  $B+1$ , and  $B+2$  by the TIC and TDC, respectively. After the transformation, three transmission gates transmit the output signal by selecting from  $B+1$ ,  $B+2$ , and  $B+0$ , depending on the value of the addend signal A. One 1:3 decoder, comprised of two NTIs, a PTI, and a binary NOR gate (shown in Figure 3(c)), is used to generate the control signal of the transmission gates. The decoder has only one high logic signal to turn on one of the three transmission gates. Specifically, if the addend signal A is 2, the decoder output ( $A_0, A_1, A_2$ ) is (2, 0, 0) to turn on the TDC path and transmit a  $B+2$  at the output; If the addend A signal is 1, the decoder output ( $A_0, A_1, A_2$ ) is (0, 2, 0) to turn on the TIC path and transmit a  $B+1$  at the output; If the addend signal A is 0, the decoder output ( $A_0, A_1, A_2$ ) is (0, 0, 2) to directly transmit B to the output through the transmission line.

The transistor-level schematic, symbol, and transient simulation results of the two kinds of ternary cycling gates are shown in Figure 3(c) and (d).  $A_N$  and  $A_P$  in the figures mean that the input signal first passes through a negative ternary inverter (NTI) or a positive ternary inverter (PTI) before connecting with the gate terminal of the corresponding MOSFET. The cycling gate has largely reduced the total number of MOSFET transistors and propagation delay. Detailed comparisons are presented in part D of this section.

### B. Design of the Unbalanced Ternary Second Stage Half Adder

The cascaded multi-trit adder chain always starts with a two-input ternary half adder. The output carry signal ( $C_{out}$ ) generated in the first stage half adder is treated as the input carry signal ( $C_{in}$ ) for the second stage addition. As indicated in the truth table in Figure 4(a), the  $C_{in}$  signal can be either 0 or 1. For example, the carry of  $(0 + 0)$ ,  $(0 + 1)$  and  $(0 + 2)$  is 0, while only the carry of  $(1 + 2)$  and  $(2 + 2)$  is 1. The  $C_{in}$  with fewer logic states simplifies the half adder in the second stage.

Figure 4(a) shows the cycling gate-based half adder in the second stage (HA'). The summation signal of the first stage half adder ( $Sum_{AB}$ ) is regarded as the augend signal, while  $C_{in}$  is treated as the addend signal. Similar to the cycling gate-based HA, the augend signal  $Sum_{AB}$  of the HA' is directly connected to a transmission gate or a TIC (+1 operation), which is determined by the control signal  $C_{in}$ . TDC (+2 operation) is not needed here



Figure 3. (a) Schematic of the Sum gate in the first stage half adder. (b) 1:3 decoder used in the Sum gate. (c) Transistor-level schematic, symbol, truth table and simulation result of the TIC, and (d) TDC gate. VDD is set to be 0.9 V and the white, stripe pattern, and black transistor symbols represent MOSFETs with  $V_{TH}$  values of 0.323 V, 0.428 V, and 0.687 V, respectively.

due to the reduced logic level condition of  $C_{in}$ . The summation is either  $Sum_{AB}$  or  $Sum_{AB} + 1$ , depending on the value of  $C_{in}$ . Specifically, if  $C_{in}$  is 0, the path connected with  $Sum_{AB}$  is turned on, transmitting  $Sum_{AB}$  to the output terminal; if  $C_{in}$  is 1, the path connected with TIC is turned on, transmitting  $Sum_{AB} + 1$  to the output terminal.

To maintain better control of transmission gates by  $C_{in}$ , a positive buffer (PB, shown in Figure 4(b)) that shifts the logic level of  $C_{in}$  from 0.45 V (half VDD) to 0.9 V (VDD) is designed. The function of PB can be expressed as

$$Y = \begin{cases} 2, & x = 1 \\ 0, & x = 0 \end{cases}$$

which is among the 27 monadic functions (Function No. 25 in Table II). The transmission gates can be fully switched on or off by the control signal  $C_{in}$  passing through a PB. Therefore, the distortion of the signal passing through the transmission gates is reduced.

The Carry signal of the second stage half adder should be 1 only when  $Sum_{AB}$  is in the high logic state (2) and  $C_{in}$  is in the middle logic state (1). Otherwise, it should be 0. This function can be realized using a binary AND gate, whose inputs are  $C_{in}$  after passing through a PB and  $Sum_{AB}$  after passing through a

negative buffer (NB, shown in Figure 4(c)). The NB is introduced here to retain the high logic state of  $Sum_{AB}$  while shifting other logic states into a low level, whose function can be expressed as:

$$Y = \begin{cases} 2, & x = 2 \\ 0, & x = 0, 1 \end{cases}$$

which is also among the 27 monadic functions (No. 19 in Table II).

### C. Design of the Unbalanced Ternary Full Adder

The optimized design of an unbalanced ternary full adder is realized by sequentially combining the two stage half adders, as shown in Figure 5(a). The final Carry signal of the full adder is generated based on the Carry signals of the two stage half adders. If either of these two Carry signals is 1, the final Carry of the full adder is 1. A gate (CarryCom) combines the Carry signals of the two half adders and generates the Carry signal of the full adder. Similarly, the CarryCom gate is regarded as a binary logic function, in which the half VDD supply voltage was used to reduce the transistor count and PDP. The proposed half/full adders are verified in Synopsys HSPICE. To compare with



Figure 4. (a) Schematic and truth table of the proposed second stage half adder. (b) Transistor-level schematic, symbol, and truth table of PB. (c) Transistor-level schematic, symbol, and truth table of NB.



Figure 5. (a) Gate-level schematic of proposed cycling gate-based unbalanced ternary full adder. (b) Transient simulation waveform of the proposed unbalanced ternary full adder based on ternary cycling gates.



Figure 6. Performance tests of the proposed unbalanced ternary half adder (a) and full adder (b) at different frequency.

reported works, the same SPICE model (Stanford 32 nm CNT MOSFET model) is used in the simulations [31]. Figure 5(b) presents the transient waveform of the proposed unbalanced ternary full adder. The correct logic functions are verified with inputs covering all situations.

The propagation delay, power consumption and PDP of the proposed unbalanced half/full adders are tested under different operation frequencies. Figure 6 presents a comparison of the performance with the half/full adders reported in [15] at an input frequency ranging from 100 MHz to 1 GHz. Figure 6(a) discusses the characteristic of the half adders. The two half adders have a similar delay at different frequencies. The proposed half adder shows higher power consumption, which leads to higher PDP performance. However, the proposed unbalanced ternary full adder has a significant advantage in the propagation delay and PDP despite the higher average power, as shown in Figure 6(b). An average improvement of 61.6% in delay and 16.1% in PDP is achieved. Specifically, 57 ps delay and 11 aJ PDP are demonstrated at a frequency of 0.5 GHz, corresponding to a 63.1% reduction in propagation delay and 20.8% improvement in PDP. Furthermore, the improvement improves with the increasing frequency. The PDP improvement reaches 23.1% and 23.9% at 800 MHz and 1 GHz, respectively.

Table III shows a more comprehensive comparison of the characteristic of the unbalanced ternary full adder between our design and the benchmark. To provide a fair comparison, only CNTFET-based ternary circuits are included, and the same SPICE model is used for the circuit simulation and tests. Our ternary full adder achieves the lowest PDP (11 aJ) as compared with other reported works to date [14], [15], [21], [22], [23], [24], [25], [26], [27]. Another noteworthy merit of our design is the

**TABLE III**  
PERFORMANCE COMPARISON OF TERNARY FULL ADDERS

| Transistor count | Delay (ps) | Average power (uW) | PDP (aJ)  | Design           |
|------------------|------------|--------------------|-----------|------------------|
| 106              | 139.41     | 1.8933             | 263.94    | [21]             |
| 132              | 102.45     | 1.9120             | 195.88    | [22]             |
| 103              | 66.17      | 1.9341             | 127.97    | [23]             |
| 318              | 88         | 1.363              | 120       | [14]             |
| 105              | 68         | 1.129              | 76        | [24]             |
| 142              | 43.95      | 1.4718             | 64.68     | [25]             |
| 106              | 131        | 0.527              | 69        | [26]             |
| 98               | 123        | 0.463              | 57        | [27]             |
| 106              | 269        | 0.128              | 34        | [15]             |
| <b>93</b>        | <b>57</b>  | <b>0.1928</b>      | <b>11</b> | <b>this work</b> |

reduced transistor count. 56 transistors are needed to realize an unbalanced ternary half adder, and only 93 transistors are used to realize a ternary full adder. To the best of our knowledge, this adder is the most compact design of an unbalanced ternary full adder purely based on MOSFETs.

#### IV. BALANCED TERNARY ADDERS

In the ternary system, the circuit can be used for both unbalanced and balanced ternary logic, and both can be used to implement one specific ternary logic function. The only difference is the encoding method. In balanced ternary logic, the high logic level (VDD) represents 1, the middle logic level (half VDD) represents 0, and the low logic level (GND) represents  $-1$ , corresponding to 2, 1 and 0 in unbalanced ternary logic, respectively. Such feature makes it possible to design mixed encoding ternary systems.

##### A. Design of the Balanced Ternary Half Adder

Figure 7(a) is the schematic of the balanced ternary half adder, which consists of a Sum gate and a Carry (Cons) gate. The Sum gate is based on cycling gates, similar to the aforementioned unbalanced ternary half adder. An identical 1:3 decoder generates the control signals of the transmission gates by connecting to the addend signal A, as shown in Figure 7(b). Likewise, the TIC and TDC perform the  $+1$  and  $-1$  operations of the input signal before being transmitted to the output terminal through the transmission gate. When the addend signal A is  $-1$ , only A2 of the three decoder outputs is in a high logic state ((A0, A1, A2) =  $(-1, -1, 1)$ ) to turn on the TDC path and transmit B-1 to the output through the transmission gate; When A is 0, only A1 is in the high logic state ((A0, A1, A2) =  $(-1, 1, -1)$ ) to directly transmit augend signal B to the output through the transmission gate; When A is 1, only A0 is in the high logic state ((A0, A1,



Figure 7. (a) Schematic and truth table of the proposed balanced ternary half adder. (b) Schematic of the proposed cycling gate-based Sum gate. (c) Transistor-level schematic of the Carry (Cons) gate of the balanced ternary half adder.

$A_2) = (1, -1, -1)$ ) to turn on the TIC path and transmit B+1 to the output through the transmission gate.

Different from the unbalanced full adder, three logic states are needed to represent the Carry signal in the balanced adder as shown in the truth table in Figure 7(a). The Carry gate of the balanced half adder uses a ternary consensus (Cons) gate. The Cons gate (Figure 7(c)) is an extension of a binary exclusive OR (XOR) gate, whose output is 1 if both inputs are 1,  $-1$  if both inputs are  $-1$ , and otherwise 0.

##### B. Design of the Balanced Ternary Full Adder

The balanced ternary full adder shown in Figure 8(a) [32], [33], is also divided into two sequential half adders. The full adder contains the same Sum gates proposed in the half adder. Two ternary diadic functions, namely the Not consensus gate (Ncons) and the Not accept anything gate (Nany), generate the Carry signals. The Ncons and Nany gates are standard inverse functions of consensus (Cons) and accept anything (Any) gates, respectively. Owing to the simplified transistor-level realization of the inversion logic functions, a reduced transistor count and total circuit complexity can be achieved by using Ncons and Nany instead of Cons and Any gates.

A transistor-level schematic and truth table of Ncons and Nany gates are shown in Figure 8(b) and (c), respectively. The ternary Ncons has an output of  $-1$  if both inputs are 1; an output of 1 if both inputs are  $-1$ ; and an output of 0 in all other cases. For the half adder in the balanced full adder, the inputs are ternary number of A and B, and the outputs are Sum and Carry. When A = 1, B = 1, the original A+B = 2, original Carry = 1, such that the not Carry =  $-1$ . When A =  $-1$ , B =  $-1$ , the original A+B = -2, original Carry =  $-1$ , such that the not Carry = 1.



Figure 8. (a) Schematic of the balanced ternary full adder. Transistor-level schematic, symbol, and truth table of the Ncons gate (b) and Nany gate (c). (d) Transient simulation waveform of the proposed balanced ternary full adder.

The ternary Nany gate has an output of  $-1$  if both inputs are  $-1$  ( $A = -1, B = -1$ ) or one of the inputs is  $-1$  and the other input is  $0$  ( $A = -1(0), B = 0(-1)$ ); an output of  $0$  if both inputs are  $0$  ( $A = 0, B = 0$ ) or one of the inputs is  $-1$  and the other input is  $1$  ( $A = -1(1), B = 1(-1)$ ); an output of  $1$  if both inputs are  $1$  ( $A = 1, B = 1$ ) or one of the inputs is  $1$  and the other input is  $0$  ( $A = 1(0), B = 0(1)$ ). Therefore, a Nany gate can be used to combine the two carry signals of the half adders and generate the final Carry signal.

### C. Performance of the Balanced Ternary Full Adder

A transient simulation of the proposed cycling gates-based balanced ternary full adder is shown in Figure 8(d). The correct logic function has been verified by covering all input conditions. Additionally, the propagation delay, power consumption, and PDP of the proposed balanced half/full adders are tested under different frequencies and compared with the reported method (Figure 9). Both the proposed half and full balanced adders based on cycling gates show better performance in terms of propagation delay. As compared with the benchmark [15], an average improvement of 43.51% and 54.26% in the delay is achieved in the proposed half adder and full adder, respectively. Most importantly, the proposed full adder shows better PDP



Figure 9. Performance tests of the proposed balanced ternary half adder (a) and full adder (b) at different frequencies.

performance at all the operational frequencies ranging from 100 MHz to 1 GHz and achieves an average improvement of 22.8% in PDP. The full adder achieves a 46 aJ PDP at 0.5 GHz, lower than the 57 aJ PDP in [15]. The improvement is more obvious at higher frequencies. Specifically, a 54.6% improvement in propagation delay and an 18.5% improvement in PDP are achieved at 0.5 GHz. The improvement of the delay and PDP exceeds 56% and 52%, respectively, at 1 GHz.

### V. TERNARY WALLACE TREE MULTIPLIERS BASED ON CYCLING GATES ADDERS

Multiplication is a common arithmetic operation in digital systems, particularly in the current deep learning model based on Matrix Vector Multiplication (MVM) [34]. However, it usually requires several days and dissipates thousands of kilowatt hours (kWh) to train a standard model using the most advanced GPUs [35]. Thus, to achieve high speed multiplication with low power consumption is a major concern in very large-scale integration (VLSI) circuits. Ternary logic inherently has higher data density and processing speed compared to binary logic, improving the speed and energy efficiency of the multiplication operation in the integrated circuits.

A Wallace tree multiplier is a high-speed parallel multiplier commonly used in VLSI circuits. The time consumption of a  $n$ -trit multiplication is reduced from  $O(\log^2 n)$  to  $O(\log n)$  by adopting a Wallace tree structure instead of a traditional array structure. The operation of the Wallace tree multiplier can be generally divided into three phases [36], [37]:

1. Generation of partial products (PP)
2. Accumulation of PP parallelly using full and half adders
3. Final addition to generate the result

In the following section of this paper, we use a 6-trit by 6-trit multiplication as an example to present the design of a ternary Wallace tree multipliers. Following the steps of Wallace

tree multiplier mentioned above, both unbalanced and balanced ternary Wallace tree multipliers were designed. Our proposed cycling gates-based ternary adders are adopted to optimize the performance of the multiplication.

### A. Design of Unbalanced Ternary Wallace Tree Multiplier

1-trit ternary multipliers generate the partial products in the first stage of operation of a Wallace tree multiplier. During the multiplication of  $2 \times 2$ , both the Product and Carry is 1. Therefore, the 1-trit unbalanced ternary (01 2) multiplier needs a carry signal along with the Product signal to represent the output. During the PP generation stage of a N-trit by N-trit unbalanced ternary Wallace tree multiplier,  $2N^2$  elements are produced.

The proposed flow of the unbalanced ternary Wallace tree multiplier is shown in Table IV, which is divided into six stages of addition. During Stage 0, the partial products (PP) and the corresponding Carry signals ( $C_{PP}$ ) are generated by the multiplication of each trit using a 1-trit by 1-trit unbalanced ternary multiplier. For instance, the multiplication of B0 in the multiplier B and A0 in the multiplicand A generates PP0 and  $C_{PP0}$ , and simultaneously the multiplication of B5 and A5 generates PP35 and  $C_{PP35}$ . The Carry signals ( $C_{PP}$ ) have a higher weight as compared with the Product, therefore,  $C_{PP}$  is placed in the next column. In the 6-trit by 6-trit unbalanced ternary Wallace tree multiplier, there are 72 elements, which are summated using the aforementioned cycling gates-based half/full adders. In each stage, every three rows are grouped together. If one or two rows remain after grouping, they are passed to the next stage without processing. Full adders are used in each column where there are three trits, while half adders are used in each column where there are two trits. A summation result S (starts with S0 in Table IV) and a carry result C sent to the higher weight (starts with C0 in the Table IV) are generated by the adders, which are treated as new elements in the next stage. This process is repeated until there are no more than three elements in every weight. At last, a carry-propagate-adder (CPA) is used to perform the final addition to generate the final multiplication result. Note that, for a 6-trit by 6-trit unbalanced ternary multiplier, the result can be expressed with a 12-trit ternary number, thus C49 and C63 generated listed in the 13th trit are both 0. Therefore, instead of a 7-trit CPA, a 6-trit CPA calculates the result.

### B. Design of Balanced Ternary Wallace Tree Multiplier

Balanced ternary multiplication is preferred because it is an inherently signed arithmetic operation. The most significant trit is free from the sign notation, thus higher data capacity can be achieved [38], [39].

Based on the same methodology, a balanced ternary Wallace tree multiplier is implemented. The 1-trit by 1-trit balanced ternary (-10 1) multiplication only has the Product signal. The Carry signal is not needed. Therefore, in the N-trit by N-trit balanced ternary (-10 1) multiplication,  $N^2$  elements are produced during PP generation stage. The proposed flow of a 6-trit by 6-trit balanced ternary Wallace tree multiplier is shown in Table V, which consists of 36 elements and is divided into four stages of addition. As compared with an unbalanced ternary Wallace tree



**Figure 10.** (a) Comparisons between the proposed balanced Wallace tree multiplier and the classical array multiplier regarding propagation delay, average power consumption, and PDP. (b) Comparisons of the proposed unbalanced Wallace tree multiplier and the classical array multiplier. The inset table presents the average improvements of the proposed ternary multipliers in delay and PDP as compared to the classical multipliers.

multiplier, fewer addition stages are involved in the accumulation of PP due to the reduced number of PP (36 vs. 72). Lastly, a 7-trit CPA is used to calculate the final multiplication result.

### C. Results and Comparisons

The proposed 6-trit by 6-trit balanced and unbalanced ternary Wallace tree multipliers are verified with a large number of random inputs. The correct logic functions are achieved. The performance is tested under different operation frequencies ranging from 100 MHz to 1 GHz. The classical 6-trit by 6-trit ternary multipliers with the array structure are used for the performance comparison. Figure 10 compares the delay, power, and PDP of the proposed balanced/unbalanced ternary Wallace tree multipliers and classical array multipliers.

Although the power consumption of the ternary Wallace tree multiplier is slightly higher because of more adders involved in the summation stages, the propagation delay is significantly reduced, benefiting from the parallel element summation. As presented in the inset table in Figure 10, over the frequency ranging from 100 MHz to 1 GHz, the delay of the balanced ternary Wallace tree multipliers is reduced by 22.0% in average as compared to the classical multipliers. Furthermore, the average improvement in the delay exceeds 45.5% when comparing the unbalanced Wallace tree ternary multipliers to the classical multipliers. Accordingly, the PDP of the proposed

**TABLE IV**  
PROPOSED FLOW OF THE 6-TRIT BY 6-TRIT UNBALANCED TERNARY WALLACE TREE MULTIPLIER



Note. Every rectangle means a ternary half adder or a ternary full adder.

**TABLE V**  
 PROPOSED FLOW OF THE 6-TRIT BY 6-TRIT BALANCED TERNARY WALLACE TREE MULTIPLIER



multipliers is lowered. An average improvement of 17.5% and 36.8% in PDP performance is achieved in the balanced and unbalanced ternary Wallace tree ternary multipliers, respectively. The improvement in PDP is more obvious at higher frequencies. Specifically, the PDP of the balanced Wallace tree multiplier is reduced by 27.5% and 24.0% at 800 MHz and 1 GHz, respectively. And the PDP of the unbalanced Wallace tree multiplier is lowered by 40.6% and 32.8% at 800 MHz and 1 GHz, respectively.

We further compared our Wallace tree multiplier more comprehensively with the reported multipliers in [24], [26], [27], [40], [41]. As shown in Table VI, our Wallace tree multipliers show better performance than most reported works in terms of PDP. As a result of the higher delay and the power of the 3:1 MUX-based adders used in work [24] and the 2:1 MUX-based adders used in work [27], their 6-trit multipliers showed over 200% and 300% higher PDP, respectively. The multiplier in work [26] has a 20% higher PDP due to the higher delay and the static power consumption. Additionally, within the same number of addition stages, our multiplier has fewer ternary adders and exhibits a higher utilization efficiency as compared with work [40], achieving a 76.3% lower delay and 73.6% lower PDP. Work

TABLE VI

PERFORMANCE COMPARISONS OF 6-TRIT BY 6-TRIT TERNARY MULTIPLIERS

| Delay (ps) | power (mW) | PDP (fJ) | errors in computation | Design                         |
|------------|------------|----------|-----------------------|--------------------------------|
| 740        | 65.719     | 48.633   | no                    | [23]                           |
| 1144       | 24.956     | 28.550   | no                    | [25]                           |
| 1723       | 44.419     | 76.534   | no                    | [26]                           |
| 2732       | 32.997     | 90.131   | no                    | [35]                           |
| 2518       | 8.117      | 20.439   | yes                   | [36]                           |
| 2346       | 6.948      | 16.300   | yes                   | [36]                           |
| 2346       | 6.605      | 15.495   | yes                   | [36]                           |
| 647        | 36.69      | 23.77    | no                    | this work<br>(unbalanced mode) |
| 1300.6     | 14.07      | 18.299   | no                    | this work<br>(balanced mode)   |

[41] achieves a lower PDP with the sacrifice of computation accuracy. Among the precise multiplication, the 23.77 fJ PDP achieved in our unbalanced ternary multiplier and 18.299 fJ PDP achieved in our balanced ternary multiplier are the lowest.



**Figure 11.** (a) Schematic of a standard ternary inverter (STI) based on CMOS180 technology. MOSFETs with three kinds of threshold voltages are involved, including low-threshold-voltage-N/PMOS (LVT\_NMOS:  $V_{th} = 0.292$  V; LVT\_PMOs:  $V_{th} = -0.1005$  V), middle-threshold-voltage-N/PMOS (MVT\_NMOS:  $V_{th} = 0.4185$  V; MVT\_NMOS:  $V_{th} = -0.424$  V) and high-threshold-voltage-N/PMOS (HVT\_NMOS:  $V_{th} = 0.75619$  V; HVT\_PMOs:  $V_{th} = -0.69515$  V). (b) layout design of the STI, the total area is  $8.837 \mu\text{m} * 8.992 \mu\text{m}$ . (c) Post-layout simulation of the STI.

## VI. SILICON CMOS BASED TERNARY LOGIC CIRCUITS

To date, almost all of the ternary circuits are constructed and simulated using CNTFETs SPICE models. However, it is impractical to precisely control the CNT diameters as well as its threshold voltage [42]. To implement a real ternary hardware system, the highest priority is to use commercially available silicon devices. Since the 1950s, integrated circuits have mainly been based on silicon transistors due to their great performance and rather low cost. Besides, the device threshold voltage can be well controlled by the impurity diffusion process in silicon semiconductor. Therefore, it is feasible to construct the novel ternary logic cell using silicon devices.

By utilizing the CMOS 180 nm (CMOS180) Process Design Kit (PDK) provided by Semiconductor Manufacturing International Corporation (SMIC), we have successfully demonstrated the feasibility of realizing ternary logic gates and systems using commercial silicon technology. By adopting silicon NMOS and PMOS with three kinds of threshold voltages (low/middle/high-threshold-voltage-N/PMOS), an STI is constructed. Figure 11 shows the schematic, layout design and post-layout simulation of the STI. Specific values of the threshold voltages are also indicated in the caption of Figure 11. The simulation is extensively performed at 100 MHz, during which the power consumption and propagation delay is extracted to be 0.114 uW and 0.357 ns, respectively, and the PDP is 0.0407 fJ. Similarly, the PTI and NTI are constructed and simulated based on CMOS180 technology.

**TABLE VII**  
COMPARISONS OF SILICON TRANSISTORS BASED BINARY AND TERNARY INVERTERS

| Silicon CMOS 180 device     | Binary Inverter |          | Ternary Inverter |        |       |
|-----------------------------|-----------------|----------|------------------|--------|-------|
|                             | INV(mvt.18)     | INV.(33) | PTI              | NTI    | STI   |
| delay (ps)                  | 348.4           | 302.1    | 260.4            | 144    | 357   |
| avg_power ( $\mu\text{W}$ ) | 0.0835          | 0.0894   | 0.0284           | 0.0297 | 0.114 |
| PDP (fJ)                    | 29.09           | 27.01    | 7.4              | 4.3    | 40.7  |

Note. Test condition:  $VDD = 1$  V @ CMOS180.



**Figure 12.** Schematic (a), layout design (c) and post-layout simulation of the proposed unbalanced ternary half adder.

Table VII presents a comparison of the binary inverters and the ternary inverters based on the same PDK (tested under  $VDD = 1$  V). The PTI and NTI have a large improvement regarding the delay and average power consumption. The PTI and NTI achieve an improvement of 72.60% and 84.08% in PDP as compared to the lowest delay of binary inverter. Even though four more transistors are needed in STI than a binary inverter, the proposed CMOS based STI achieves comparable performance in delay, power consumption and PDP.

We further compare our work with another prominent work on silicon based ternary logic proposed by Samsung and UNIST [43]. According to our extensively post-layout simulations, our design has a significant advantage in the operation frequency (100 MHz vs. sub-kHz), which enables our ternary devices to be capable of the applications in different scenarios.

By utilizing the CMOS180 technology, we further established and simulated an unbalanced ternary half adder as shown in Figure 12. The same design as proposed above in Section III is adopted here. A Sum gate and a Carry gate are constructed based on silicon transistors to constitute the unbalanced ternary half adder. There are totally 56 transistors involved in the half adder, and the total layout area is 32.374 μm \* 30.62 μm (shown in Figure 12(c)). Based on the post-layout simulations, the average power consumption, delay, and PDP were tested to be 0.985

uW, 1.199 ns and 1.182 fJ, respectively. A direct comparison between the binary and ternary adders is not feasible and not useful/reasonable. As more transistors are involved in a ternary adder unit, a ternary adder inherently has a higher PDP. However, considering the large data volume per digit carried by ternary logic, the total PDP of binary and ternary logic will converge with data depth increasing [44]. Additionally, when considering the power consumption and the delay caused by the interconnections within a chip, the advantages of ternary logic will prevail.

## VII. CONCLUSION

In this paper, high-performance unbalanced/balanced ternary adders with reduced PDP and transistor count have been realized based on ternary cycling gates. Average improvements of 16.1% and 22.8% are realized in the unbalanced and the balanced ternary full adders, respectively. Besides, two kinds of high-speed parallel ternary Wallace tree multipliers are presented and implemented based on the proposed ternary adders, which show better performance than the reported works. Average improvements of 17.5% and 36.8% in PDP are achieved in balanced and unbalanced ternary Wallace tree ternary multipliers as compared to the classical array multipliers, respectively. Moreover, the proposed methods have also been successfully explored using standard silicon CMOS 180 nm devices for the first time. The post-layout simulation result shows comparable high performance to the binary device, indicating its great potential for the practical application. The even larger ternary arithmetic circuits, such as the vector matrix multiplication processing element, and its physical implementation will be the further work.

## REFERENCES

- [1] J. Tang, T. Ma, and Q. Luo, "Trends prediction of Big Data: A case study based on fusion data," *Procedia Comput. Sci.*, vol. 174, pp. 181–190, 2020.
- [2] J. Wang, C. Xu, J. Zhang, and R. Zhong, "Big data analytics for intelligent manufacturing systems: A review," *J. Manuf. Syst.*, vol. 62, pp. 738–752, 2021.
- [3] T. N. Theis and H.-S. P. Wong, "The end of Moore's law: A new beginning for information technology," *Comput. Sci. Eng.*, vol. 19, no. 2, pp. 41–50, 2017.
- [4] N. Magen, A. Kolodny, U. Weiser, and N. Shamir, "Interconnect-power dissipation in a microprocessor," in *Proc. Int. Workshop System Level Interconnect Prediction*, 2004, pp. 7–13.
- [5] C.-Y. Wu and H.-Y. Huang, "Design and application of pipelined dynamic CMOS ternary logic and simple ternary differential logic," *IEEE J. Solid-State Circuits*, vol. 28, no. 8, pp. 895–906, Aug. 1993.
- [6] M. Huang, X. Wang, G. Zhao, P. Coquet, and B. Tay, "Design and implementation of ternary logic integrated circuits by using novel two-dimensional materials," *Appl. Sci.*, vol. 9, no. 20, 2019, Art. no. 4212.
- [7] X. Wu and F. Prosser, "CMOS ternary logic circuits," *IEE Proc. G. Electron. Circuits Syst.*, vol. 137, pp. 21–27, 1990.
- [8] M. Khalid and J. Singh, "Memristor based unbalanced ternary logic gates," *Analog Integr. Circuits Signal Process.*, vol. 87, no. 3, pp. 399–406, 2016.
- [9] M. Huang, S. Li, Z. Zhang, X. Xiong, X. Li, and Y. Wu, "Multifunctional high-performance van der Waals heterostructures," *Nature Nanotechnol.*, vol. 12, no. 12, pp. 1148–1154, 2017.
- [10] J. Shim et al., "Phosphorene/rhenium disulfide heterojunction-based negative differential resistance device for multi-valued logic," *Nature Commun.*, vol. 7, no. 1, pp. 1–8, 2016.
- [11] L. Lee et al., "ZnO composite nanolayer with mobility edge quantization for multi-value logic transistors," *Nature Commun.*, vol. 10, no. 1, pp. 1–9, 2019.
- [12] G. Zhao et al., "Ternary logics based on 2D ferroelectric-incorporated 2D semiconductor field effect transistors," *Front. Mater.*, vol. 9, 2022, Art. no. 872909.
- [13] W. Huang et al., "Ternary logic circuit based on negative capacitance field-effect transistors and its variation immunity," *IEEE Trans. Electron Devices*, vol. 68, no. 7, pp. 3678–3683, Jul. 2021.
- [14] S. Lin, Y.-B. Kim, and F. Lombardi, "CNTFET-based design of ternary logic gates and arithmetic circuits," *IEEE Trans. Nanotechnol.*, vol. 10, no. 2, pp. 217–225, Mar. 2011.
- [15] S. Kim, S.-Y. Lee, S. Park, K. R. Kim, and S. Kang, "A logic synthesis methodology for low-power ternary logic circuits," *IEEE Trans. Circuits Syst. I: Regular Papers*, vol. 67, no. 9, pp. 3138–3151, Sep. 2020.
- [16] B. Srinivasu and K. Sridharan, "A synthesis methodology for ternary logic circuits in emerging device technologies," *IEEE Trans. Circuits Syst. I: Regular Papers*, vol. 64, no. 8, pp. 2146–2159, Aug. 2017.
- [17] R. A. Jaber, A. Kassem, A. M. El-Hajj, L. A. El-Nimri, and A. M. Haidar, "High-performance and energy-efficient CNFET-based designs for ternary logic circuits," *IEEE Access*, vol. 7, pp. 93871–93886, 2019.
- [18] J. M. Aljaam, R. A. Jaber, and S. A. Al-Maadeed, "Novel ternary adder and multiplier designs without using decoders or encoders," *IEEE Access*, vol. 9, pp. 56726–56735, 2021.
- [19] S. Firouzi, S. Tabrizchi, F. Sharifi, and A.-H. Badawy, "High performance, variation-tolerant CNFET ternary full adder a process, voltage, and temperature variation-resilient design," *Comput. Elect. Eng.*, vol. 77, pp. 205–216, 2019.
- [20] D. M. Miller and M. A. Thornton, "Multiple valued logic: Concepts and representations," *Synth. Lectures Digit. Circuits Syst.*, vol. 2, no. 1, pp. 1–127, 2007.
- [21] P. K. Seyyed Ashkan Ebrahimi and M. SaeidSorouri, "Low power CNTFET-based ternary full adder cell for nanoelectronics," *Int. J. Soft Comput. Eng.*, vol. 2, no. 2, pp. 291–295, 2012.
- [22] P. Keshavarzian and R. Sarikhani, "A novel CNTFET-based ternary full adder," *Circuits Syst. Signal Process.*, vol. 33, no. 3, pp. 665–679, 2014.
- [23] A. G. Asibelagh and R. F. Mirzaee, "Partial ternary full adder versus complete ternary full adder," in *Proc. Int. Conf. Elect. Commun. Comput. Eng.*, 2020, pp. 1–6.
- [24] B. Srinivasu and K. Sridharan, "A synthesis methodology for ternary logic circuits in emerging device technologies," *IEEE Trans. Circuits Syst. I: Regular Papers*, vol. 64, no. 8, pp. 2146–2159, Aug. 2017.
- [25] R. Faghhi Mirzaee, K. Navi, and N. Bagherzadeh, "High-efficient circuits for ternary addition," *VLSI Des.*, vol. 2014, pp. 1–15, vol. 2014.
- [26] S. Kim, T. Lim, and S. Kang, "An optimal gate design for the synthesis of ternary logic circuits," in *Proc. 23rd Asia South Pacific Des. Automat. Conf.*, 2018, pp. 476–481.
- [27] C. Vudadha, A. Surya, S. Agrawal, and M. Srinivas, "Synthesis of ternary logic circuits using 2: 1 multiplexers," *IEEE Trans. Circuits Syst. I: Regular Papers*, vol. 65, no. 12, pp. 4313–4325, Dec. 2018.
- [28] L.-M. Peng, Z. Zhang, and C. Qiu, "Carbon nanotube digital electronics," *Nature Electron.*, vol. 2, no. 11, pp. 499–505, 2019.
- [29] K. Sridharan, B. Srinivasu, and V. Pudi, *Low-Complexity Arithmetic Circuit Design in Carbon Nanotube Field Effect Transistor Technology*. Berlin, Germany: Springer, 2020.
- [30] A. Dhande, V. Ingole, and V. Ghiye, *Ternary Digital System: Concepts and Applications*. Austin, TX, USA: SM Online Publishers, 2014.
- [31] J. Deng and H.-S. P. Wong, "A compact SPICE model for carbon-nanotube field-effect transistors including nonidealities and its application—Part I: Model of the intrinsic channel region," *IEEE Trans. Electron Devices*, vol. 54, no. 12, pp. 3186–3194, Dec. 2007.
- [32] D. W. Jones, "Fast ternary addition," *Dept. Comput. Sci., Univ. Iowa*, IA City, IA, USA, 2013.
- [33] D. W. Jones, "Standard ternary logic," Feb. 11, 2013.
- [34] Z. Sun and R. Huang, "Time complexity of in-memory matrix-vector multiplication," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 68, no. 8, pp. 2785–2789, Aug. 2021.
- [35] T.-J. Yang, Y.-H. Chen, J. Emer, and V. Sze, "A method to estimate the energy consumption of deep neural networks," in *Proc. 51st Asilomar Conf. Signals Syst. Comput.*, 2017, pp. 1916–1920.
- [36] C. S. Wallace, "A suggestion for a fast multiplier," *IEEE Trans. Electron. Comput.*, vol. EC-13, no. 1, pp. 14–17, Feb. 1964.
- [37] M. J. Rao and S. Dubey, "A high speed and area efficient Booth recoded Wallace tree multiplier for fast arithmetic circuits," in *Proc. Asia Pacific Conf. Postgraduate Res. Microelectronics Electron.*, 2012, pp. 220–223.
- [38] B. Parhami, "Truncated ternary multipliers," *IET Comput. Digit. Techn.*, vol. 9, no. 2, pp. 101–105, 2015.
- [39] Y. Kang et al., "A novel ternary multiplier based on ternary CMOS compact model," in *Proc. IEEE 47th Int. Symp. Mult.-Valued Log.*, 2017, pp. 25–30.

- [40] S. Tabrizchi, A. Panahi, F. Sharifi, H. Mahmoodi, and A.-H. A. Badawy, "Energy-efficient ternary multipliers using CNT transistors," *Electronics*, vol. 9, no. 4, 2020, Art. no. 643.
- [41] S. Kim, Y. Kang, S. Baek, Y. Choi, and S. Kang, "Low-power ternary multiplication using approximate computing," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 68, no. 8, pp. 2947–2951, Aug. 2021.
- [42] S. Banerjee, A. Chaudhuri, and K. Chakrabarty, "Analysis of the impact of process variations and manufacturing defects on the performance of carbon-nanotube FETs," *IEEE Trans. Very Large Scale Integration (VLSI) Syst.*, vol. 28, no. 6, pp. 1513–1526, Jun. 2020.
- [43] Jeong, J. W. et al., "Tunnelling-based ternary metal-oxide-semiconductor technology," *Nature Electron.*, vol. 2, no. 7, pp. 307–312, 2019.
- [44] Yoon, J., Baek, S., Kim, S., and Kang, S., "Optimizing ternary multiplier design with fast ternary adder," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 70, no. 2, pp. 766–770, Feb. 2023.



**Guangchao Zhao** received the BS degree in microelectronic science and Engineering from Wuhan University, China, in 2019. He is currently working toward the PhD degree with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. His research interests include multi-value-logic, integrated circuits design and simulation, implementation of ternary logic gates based on emerging electronic devices, and in-memory-computing architecture and system.



**Zhiwei Zeng** received the BS degree in electronic and information engineering from the Zhengzhou University, China, in 2017. He is currently working toward the MS degree with Southern University of Science and Technology, Shenzhen, China. His current research interests include VLSI physical design optimization and ternary logic circuit.



current H-index is 25.

**Xingli Wang** received the BS degree from Jilin University, China, in 2010, and the PhD degree from Nanyang Technological University, Singapore, in 2016. He is now a senior research fellow with CNRS International – NTU – Thales Research Alliance (CINTRA). He is working on the synthesis 2D materials and their heterostructures, such as MoS<sub>2</sub> and WS<sub>2</sub>, and exploring their application in field effect transistors, tunneling devices and thermoelectric devices. He has published 46 peer-reviewed papers, and his



**Abdelrahman G. Qoutb** received the BS degree in communication and electronics engineering from Fayoum University, in 2014, and the PhD degree from the School of Electrical and Computer Engineering, University of Rochester, USA, in 2022. He is now a research and development failure analysis engineer with Intel. His research interests include VLSI, ASIC, nanoelectronic device, and modeling.



**Philippe Coquet** received the PhD degree in electrical and electronic engineering from the University of Rennes 1, France, in 1993. From 1993 to 1994, he was a postdoctoral researcher with the National Institute of Information and Communications Technology (NICT) in Tokyo. In 1994, he was appointed as an assistant professor with Ecole Normale Supérieure de Cachan, France. From 2002 to 2005, he was seconded to the University of Tokyo in the frame of the Laboratory for Integrated Micro-Mechatronic Systems (LIMMS/CNRS). In 2005, he was appointed as a professor with the University of Lille 1 and he joined the Institute for Electronics Microelectronics and Nanotechnology (IEMN/CNRS). He is the director of CINTRA since September 2013, and adjunct professor with the School of Electrical & Electronic Engineering, NTU. His research interests include millimeter wave devices, micro, and nanotechnologies for RF applications.



**Eby G. Friedman** (Fellow, IEEE) received the BS degree in electrical engineering from Lafayette College, Easton, PA, USA, in 1979, and the MS and PhD degrees in electrical engineering from the University of California at Irvine, Irvine, CA, USA, in 1981 and 1989, respectively. He is the author of more than 500 articles, book chapters, and 19 patents and the author or an editor of 18 books in the fields of high-speed and low-power CMOS design techniques, 3-D design methodologies, high-speed interconnect, and the theory and application of synchronous clock and power distribution networks. His current research and teaching interests include high performance synchronous digital and mixed-signal micro-electronic design and analysis with application to high-speed portable processors, low-power wireless communications, and server farms.



**Beng Kang Tay** (Senior Member, IEEE) received the BEng (Hons) and MSc degrees from the National University of Singapore in 1985 and 1989, respectively, and the PhD degree from the School of Electrical and Electronic Engineering (EEE), NTU, in 1999. He is a full professor with Nanyang Technological University (NTU), Singapore. His research focuses on the synthesis and applications of low dimensional materials, such as carbon nanofilms, carbon nanotubes and 2D materials (especially transition metal chalcogenides). To date, he has published more than 400 journal papers with Google Scholar H index of 60. Currently, he is the associate chair with School of Electrical and Electronic Engineering, NTU, and the deputy director with CNRS International – NTU – Thales Research Alliance (CINTRA).



**Mingqiang Huang** received the BEng and PhD degrees in physics from the Huazhong University of Science and Technology, Wuhan, China, in 2013 and 2018, respectively. From 2018 to 2019, he was a research fellow with Nanyang Technological University, Singapore, focusing on energy-efficient micro-electronics, and logic circuits. Since November 2019, he has been with the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, as a research Associate Professor. His current research interests include memristor, memristor circuits, and artificial intelligence (AI) hardware accelerators.