

# Power Reduction Techniques in CMOS Gates

*Final Report*

EC802 Low Power VLSI Design

*by*

Bhimreddy Yarabandi (201EC170)

Sampath N (201EC158)

Rohit kumar (201EC154)

*under the guidance of*

Dr Kalpana Bhat



DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING  
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA  
SURATHKAL, MANGALORE - 575025

# Power Reduction Techniques in CMOS Gates

Bhimaraddy B Y (201EC170) , Sampath N (201EC158) and Rohit Kumar (201EC154)

Department of ECE, National Institute of Technology Karnataka-575025, India

**Abstract**—With the increasing demand for low-power digital circuits, low-power techniques for CMOS gates have become a critical research area in the field of microelectronics. This report presents an overview of various low-power techniques that can be used in CMOS gates to reduce their power consumption with slightly compromising performance. The report discusses various design techniques such as CGTx (Control Gate Transistor), LECTOR technique, and multi vth design. Additionally, the report also presents a comparative analysis of the different techniques on Inverter, Inverter chain(3 and 5), NAND2 and 2 bit Comparator based on their power savings, performance impact, and implementation complexity. Overall, the report provides valuable insights for designers who aim to achieve low-power designs for digital circuits, especially for battery-operated and portable devices.

**Keywords**—CMOS gates, CGTx, LECTOR, Comparator

## I. INTRODUCTION

In today's technology-driven world, electronic devices are becoming more prevalent, portable, and energy-efficient. With the proliferation of mobile devices, Internet of Things (IoT) devices, and other battery-powered applications, reducing power consumption has become a critical design constraint. In this context, low-power techniques for CMOS gates have gained significant attention as a promising solution to reduce the power consumption of digital circuits without sacrificing performance.

The complementary metal-oxide-semiconductor (CMOS) technology is widely used in digital circuit design due to its excellent performance and scalability. However, the power consumption of CMOS gates is a major concern, especially for battery-operated devices. Therefore, there is a growing need for developing low-power techniques to reduce the power consumption of CMOS gates.

The focus of this paper is, therefore, to reduce power consumption in inverter, inverter chain, Nand2 and 2 bit comparator using CGTx, and LECTOR techniques.

## II. LECTOR TECHNIQUE

### A. Preliminaries

In this section, we briefly describe the models used in this work for estimating power dissipation for short-channel MOSFETs. The leakage current calculation is not straightforward due to the highly nonlinear behavior of the drain current of the device with respect to source/drain voltages. In the BSIM model, the threshold voltage is expressed as

$$V_t = V_{FB} + \phi_s + k_1\sqrt{\phi_s} - k_2\phi_s - \eta V_{dd} \quad (1)$$

where  $V_{FB}$  is the flatband voltage,  $\phi_s$  is twice the Fermi potential, and  $k_1$  and  $k_2$  represents the nonuniform doping effect,  $\eta$  models the drain-induced barrier lowering (DIBL)

effect, an undesirable punch-through current flowing between the source and drain below the surface of the channel. The leakage current for NMOS transistors operating in weak-inversion region (i.e.,  $v_{gs}=0$ ) is given by

$$I_s = I_0 \exp\left(\frac{(V_{gs} - V_t)}{nV_T}\right) \left(1 - \exp\left(-\frac{V_{ds}}{V_T}\right)\right) \quad (2)$$



Fig.1 - Two input LCT NAND gates. The leakage control transistors LCT1 and LCT2 are inserted between nodes N1 and N2 and they act as self-controlled stacked transistors.

### B. Lector Technique operation (in NAND2)

The basic idea behind our approach for reduction of leakage power is the effective stacking of transistors in the path from supply voltage to ground. “A state with more than one transistor OFF in a path from supply voltage to ground is far less leaky than a state with only one transistor OFF in any supply to ground path.” In our method, we introduce two leakage control transistors (LCTs) in each CMOS gate such that one of the LCTs is near its cutoff region of operation. We illustrate our Leakage Control Transistor technique (LECTOR) with the case of a NAND gate. A CMOS NAND gate with the addition of two leakage control transistors is shown in Fig. 1 .

Two leakage control transistors LCT1 (PMOS) and LCT2 (NMOS) are introduced between the nodes and of the pull-up and pull-down logic of the NAND gate. The drain nodes of the transistors LCT1 and LCT2 are connected

TABLE I  
STATE MATRIX OF TWO-INPUT LCT NAND GATE

| Transistor Reference | Input Vector - ( $A_{in}, B_{in}$ ) |                    |                    |                    |
|----------------------|-------------------------------------|--------------------|--------------------|--------------------|
|                      | (0,0)                               | (0,1)              | (1,0)              | (1,1)              |
| $M_1$                | On state                            | On state           | Off state          | Off state          |
| $M_2$                | On state                            | Off state          | On state           | Off state          |
| $LCT_1$              | Near Cut-Off state                  | Near Cut-Off state | Near Cut-Off state | On state           |
| $LCT_2$              | On state                            | On state           | On state           | Near Cut-Off state |
| $M_3$                | Off state                           | Off state          | On state           | On state           |
| $M_4$                | Off state                           | On state           | Off state          | On state           |

together to form the output node of the NAND gate. The source nodes of the transistors are connected to nodes N1 and N2 of pull-up and pull-down logic, respectively. The switching of transistors  $LCT_1$  and  $LCT_2$  are controlled by the voltage potentials at nodes N2 and N1 respectively. This wiring configuration ensures that one of the LCTs is always near its cutoff region, irrespective of the input vector applied to the NAND gate.

Consider the dc characteristics of the LCT NAND gate. When  $A_{in}=1$  V and  $B_{in}=0$  V, the voltage at the node N2 is 800 mV. This voltage is not sufficient to turn  $LCT_1$  completely to OFF state. Hence, the resistance of  $LCT_1$  will be lesser than its OFF state resistance, allowing conduction. Even though the resistance of  $LCT_1$  is not as high as its OFF state resistance, it increases the resistance of Vdd to ground path, controlling the flow of leakage currents, resulting in leakage power reduction. Similarly, when  $A_{in}=1$  V and  $B_{in}=1$  V, the voltage of node N1 is 200 mV, operating the transistor  $LCT_2$  near its cutoff region. The state of the transistors for all possible combinations of input vectors for the LCT NAND gate are tabulated in Table I. Thus, the introduction of LCTs increases the resistance of the path from Vdd to ground. This also increases the propagation delay of the gate. To reduce this hostile effect, the transistors of LCT gate are sized such that the propagation delay is equal to its conventional counterpart.

### III. CONTROL GATE TECHNIQUE

#### A. Preliminaries

The total power consumption of a CMOS inverter can be expressed as the sum of four components as described in equation 3.

$$P_{tot} = P_{dyn\text{-switch}} + P_{dc\text{-leak}} + P_{stat\text{-leak}} + P_{th} \quad (3)$$

$P_{dyn\text{-switch}}$  is the dynamic switching power provided by load capacitance,  $CL$ , charging through the Pull up transistor at each transition.  $P_{dc\text{-leak}}$  is the dc leakage, known also as Short-circuit power dissipation. It is caused by the current flow through the direct path existing between VDD and GND terminals when pull-up and pull-down transistors conduct at the same time during a switching transition. The aim of this paper is to reduce dc power leakage.  $P_{stat\text{-leak}}$  and  $P_{th}$  are referred to as a static power dissipation, since  $P_{stat\text{-leak}}$  is generated by the current flow,  $I_{stat}$ , in absence of switching activity and  $P_{th}$  is caused by temperature-increased leakage current. They both depend on

transistor geometry, process, technology node and temperature.

#### B. Control Gate Technique operation

The basic idea behind this approach is to create a delay on the input of the logic gate so that the pull-up and pull-down transistors do not switch at the same time so that both devices are never on simultaneously preventing short-circuit leakage. Thus, a control transistor is added on the gate of each driving transistor separating the input of the inverter as represented in Fig. 2. Therefore the generated delay corresponds to  $R_{on}$  of the Control Gate transistor multiplied by the  $C_{gb}$  capacitor of the pull-up or pull-down transistor.  $C_p$  and  $C_n$  signals control the gates of transistors  $GCTP$  and  $GCTN$  respectively so that  $MP$  and  $MN$  doesn't switch at the same time.

To avoid having extra control signals, we can use  $MP$  and  $MN$  gate nodes to control the switching of  $GCTN$  and  $GCTP$  respectively as represented in Fig. 2(a). The chronograph in Fig. 2(b). explains the functioning of this configuration for an inverter gate. In one transient phase (U1), when IN turns to logic 1,  $GCTP$  is ON, so  $GP$  gets logic 1. At this moment (t2),  $MP$  turns OFF and  $GCTN$  controlled by  $GP$  turns ON, so  $GN$  gets its logic 1 ( $VDD-V_{tn}$ ) and therefore  $MN$  turns ON. Thus we get the reversed output. On the other phase (U2), when IN falls to logic 0, since  $GCTN$  is ON,  $GN$  gets logic 0 so  $MN$  transistor turns OFF and  $GCTP$  turns ON. Therefore  $GP$  gets to ( $0+V_{tp}$ ) which results to turn ON  $MP$  and make  $GCTN$  to its near cut off region. The state of all transistors during the rise and fall transient phases are detailed in Table II.



Fig. 2- (a) Inverter structure with CGTx, (b) CGTx inverter functioning

TABLE II

STATE MATRIX OF GATE CONTROLLED INVERTER

| Transistor | State 1  | Transient 1 (U1) |          |     |          |          | State 2  | Transient 2 (U2) |     |     |          |          |
|------------|----------|------------------|----------|-----|----------|----------|----------|------------------|-----|-----|----------|----------|
|            |          | t0               | t1       | t2  | t3       | t4       |          | t5               | t6  | t7  | t8       | t9       |
| $GCTP$     | on       | on               | on       | on  | Near off | Near off | Near off | Near off         | on  | on  | on       | on       |
| $GCTN$     | Near off | Near off         | Near off | on  | on       | on       | on       | on               | on  | on  | Near off | Near off |
| $MP$       | on       | on               | on       | off | off      | off      | off      | off              | off | off | off      | on       |
| $MN$       | off      | off              | off      | off | on       | on       | on       | on               | on  | off | off      | off      |

When  $V_{in} = 1$  V, the voltage at the node GN is about 700 mV which corresponds to  $V_{DD} - V_{th}$ ,  $V_{th}$  is the threshold of GCTs transistors. This voltage is not sufficient to turn GCTP completely to its OFF state. Similarly, when  $V_{in} = 0$  V, the voltage at GP is at 300 mV ( $= V_{th}$ ) operating GCTN near the cut-off region. This drive loss on GP and GN nodes, which drive MP and MN respectively, results in a drop in drive current for the case of an inverter.



Fig. 3- Simulation results of CGTx inverter.

While implementing this technique we have observed that the average leakage power consumption was higher compared to simple inverters because the  $V_{th}$  drop across pass transistors makes transistors not to completely cut off hence to reduce the leakage we have adopted the standard LECTOR technique which is explained in section II.

#### IV. PROPOSED DESIGN AND RESULTS

##### A. INVERTER



Fig. 4 (a) Proposed Inverter design



Fig. 4(b) Corresponding layout in magic

To reduce leakage actually we went with two modifications of LECTOR. One is shown in section II and second modification is shown in Fig 5 and from now onwards we call CGTx+LECTOR-01 as proposed design 1 and CGTx+LECTOR-02 as proposed design 2.



Fig. 5 LECTOR modification 2

As such there is not much change in reduction of total power consumption but there are quite lower delay values compared to proposed design 1. In the next page we will see how these combined techniques are very useful for reducing total power consumption in digital CMOS gates.



Fig. 6 VTC of the proposed inverter

TABLE III a. POWER CONSUMPTION TABLE (in uW and 45nm node)

| VDD(V) | Simple inverter | Proposed design 1 | proposed design 2 |
|--------|-----------------|-------------------|-------------------|
| 0.7    | 0.340           | 0.154             | 0.129             |
| 0.75   | 0.436           | 0.193             | 0.169             |
| 0.8    | 0.556           | 0.238             | 0.222             |
| 0.85   | 0.706           | 0.292             | 0.291             |
| 0.9    | 0.895           | 0.365             | 0.381             |
| 0.95   | 1.133           | 0.446             | 0.5               |
| 1.00   | 1.434           | 0.5437            | 0.6569            |



Fig. 7 a. Graph for corresponding TABLE III a.



Fig. 7 b. Graph for corresponding TABLE III b.

TABLE III b. POWER CONSUMPTION TABLE (in nW and 180 nm node)

| VDD(V) | Simple inverter | Proposed design 1 | proposed design 2 |
|--------|-----------------|-------------------|-------------------|
| 1.2    | 1.330           | 0.834             | 1.024             |
| 1.3    | 1.609           | 1.136             | 1.153             |
| 1.4    | 1.939           | 1.448             | 1.348             |
| 1.5    | 2.328           | 1.661             | 1.638             |
| 1.6    | 2.790           | 1.979             | 1.953             |
| 1.7    | 3.340           | 2.245             | 1.96              |
| 1.8    | 3.996           | 2.181             | 2.208             |

From the above graphs we can see how the combination of CGTx and LECTOR technique is efficient in reducing power consumption of CMOS gates. In 45 nm technology for  $vdd=1v$ , the simple inverter design consumes 1.434uW power while our proposed inverter consumes 0.5437uW which is 62.08% effective power reduction with increment in delay from 1.215ns to 1.903ns which is 36.15% increment in propagation delay. Hence this proves to be the best power reduction technique when frequency of operation is required is low because the extra delay through pass transistors causes the increment in the delay and therefore leading to decrease in frequency in operation. In addition we also implemented MT莫斯设计 in which the leakage control transistors are replaced by high  $V_{th}$  devices because we know that higher the threshold voltage lower will be the leakage. In stacking also the extra stacked transistors are high  $V_{th}$  devices and to adjust performance we need to scale up the series connected devices.

TABLE IV a .DELAY TABLE (in ns and 45nm node)

| VDD(V) | Simple inverter | Proposed design 1 | proposed design 2 |
|--------|-----------------|-------------------|-------------------|
| 0.7    | 1.471           | 2.833             | 2.029             |
| 0.75   | 1.408           | 2.585             | 1.885             |
| 0.8    | 1.354           | 2.398             | 1.763             |
| 0.85   | 1.312           | 2.235             | 1.668             |
| 0.9    | 1.281           | 2.120             | 1.583             |
| 0.95   | 1.240           | 2.007             | 1.518             |
| 1.00   | 1.215           | 1.903             | 1.477             |



Fig 8 b. Graph for corresponding TABLE IV b

TABLE IV b. DELAY TABLE (in ns and 180 nm node)

| VDD(V) | Simple inverter | Proposed design 1 | proposed design 2 |
|--------|-----------------|-------------------|-------------------|
| 1.2    | 6.479           | 22.93             | 14.71             |
| 1.3    | 6.071           | 22.60             | 12.976            |
| 1.4    | 5.747           | 19.83             | 11.671            |
| 1.5    | 5.484           | 18.13             | 10.647            |
| 1.6    | 5.267           | 16.76             | 9.952             |
| 1.7    | 5.088           | 15.213            | 9.374             |
| 1.8    | 4.937           | 19.41             | 8.912             |



Fig. 8 a. Graph for corresponding TABLE IV a

## B . INVERTER CHAIN

1) Length =3



Fig. 9 a Layout of length 3 inverter chain

TABLE V a. Vdd=1v and 45nm technology node

| Designs         | Power (uW) | Delay (ns) |
|-----------------|------------|------------|
| simple inverter | 10.534     | 0.3126     |
| proposed des1   | 7.672      | 0.737      |
| proposed des2   | 4.842      | 0.506      |

TABLE V b. Vdd=1.8v and 180nm technology node

| Designs         | Power (nW) | Delay (ns) |
|-----------------|------------|------------|
| simple inverter | 27.92      | 1.3349     |
| proposed des1   | 20.35      | 4.906      |
| proposed des2   | 15.37      | 3.1308     |

2) Length =5



Fig. 9 b Layout of length 5 inverter chain

TABLE V c. Vdd=1v and 45nm technology node

| Designs         | Power (uW) | Delay (ns) |
|-----------------|------------|------------|
| simple inverter | 47.295     | 0.0971     |
| proposed des1   | 29.804     | 0.663      |
| proposed des2   | 43.523     | 0.381      |

TABLE V d. Vdd=1.8v and 180nm technology node

| Designs         | Power (nW) | Delay (ns) |
|-----------------|------------|------------|
| simple inverter | 123.6      | 0.4916     |
| proposed des1   | 52.04      | 5.468      |
| proposed des2   | 77.871     | 2.549      |

### C. NAND2 GATE



Fig. 10 (a) Proposed NAND2 design

TABLE VI. RESULTS OF NAND2 GATES (Vdd=1v and 45nm node)

| Designs       | Power (uW) | Delay(ns) |       |      |
|---------------|------------|-----------|-------|------|
|               |            | Best      | Worst | Avg  |
| NAND          | 1.873      | 0.97      | 1.26  | 1.11 |
| Proposed des1 | 1.075      | 1.87      | 3.05  | 2.46 |
| Proposed des2 | 1.457      | 1.46      | 2.44  | 1.95 |



Fig. 10 (b) Simulation results of proposed design of NAND2

#### D. 2 BIT COMPARATOR



Fig. 11 (a) 2 bit comparator gate level design

TABLE VII a. POWER CONSUMPTION OF 2 BIT COMPARATOR (VDD=1v and 45nm node)

| Input combination |   | Power (uW)    |                 |
|-------------------|---|---------------|-----------------|
| A                 | B | Simple design | Proposed design |
| 0                 | 0 | 48.031        | 25.510          |
| 1                 | 1 | 42.013        | 24.841          |
| 2                 | 2 | 40.926        | 24.457          |

TABLE VII b. POWER CONSUMPTION OF 2 BIT COMPARATOR (VDD=1.8v and 180 nm node)

| Input combination |   | Power (nW)    |                 |
|-------------------|---|---------------|-----------------|
| A                 | B | Simple design | Proposed design |
| 0                 | 0 | 127           | 94              |
| 1                 | 1 | 119.8         | 110             |
| 2                 | 2 | 117.4         | 107             |

TABLE VII c. PROPAGATION DELAY OF 2 BIT COMPARATOR (VDD=1v and 45nm node)

| Input combination |   | Delay (ns)    |                 |
|-------------------|---|---------------|-----------------|
| A                 | B | Simple design | Proposed design |
| 0                 | 0 | 0.416         | 1.02            |
| 1                 | 1 | 0.4328        | 1.036           |
| 2                 | 2 | 0.407         | 1.072           |

TABLE VII d. PROPAGATION DELAY OF 2 BIT COMPARATOR (VDD=1.8v and 180 nm node)

| Input combination |   | Delay (ns)    |                 |
|-------------------|---|---------------|-----------------|
| A                 | B | Simple design | Proposed design |
| 0                 | 0 | 1.6592        | 4.61            |
| 1                 | 1 | 1.6607        | 4.68            |
| 2                 | 2 | 1.6670        | 4.65            |



```

    v a1 a1 0 dc 0 pw1(0 0 1 0 1.00000001 0 2 0 2.00000001 1 3 1 4 1 4.00000001 0 5 0)
    v a0 a0 0 dc 0 pw1(0 0 1 0 1.00000001 1 2 1 2.00000001 0 3 0 4 0 4.00000001 1 5 1)
}
v b1 b1 0 dc 0 pw1(0 0 1 0 1.00000001 1 2 1 4 1 4.00000001 0 5 0)
v b0 b0 0 dc 0 pw1(0 0 1 0 1.00000001 0 2 0 3 0 3.00000001 1 5 1)
}

```

Fig. 11(b) Output of comparator for above input combinations.

## V. CONCLUSION

In this report, we have discussed various low-power techniques for CMOS gates, which have become critical to achieve power-efficient digital circuits in today's energy-constrained applications. We have seen that these techniques can significantly reduce power consumption while maintaining acceptable performance levels. The combination of LECTOR and Control Gate techniques can provide additional power savings in certain circuits. For example, implementing both techniques in the inverter circuit reduced its power consumption by up to 60%. The reduction in power consumption did not significantly impact the circuit's functionality or performance. The circuits remained functional and met the required specifications.

LECTOR technique was found to be effective in reducing the standby current consumption in all the circuits. This technique is particularly useful in off-chip drivers, where the circuit spends a significant amount of time in idle state. The control gate transistor technique aims to reduce short-circuit power by adding an additional gate in parallel with the main gate. This technique helps to reduce the short circuit power consumption of the circuit by controlling the flow of current. Short circuit power constitutes 15-20% of the dynamic power if the input output rise and fall time are made equal. The scaling down of device dimensions, supply voltage, and threshold voltage for achieving high performance and low dynamic power dissipation has largely contributed to the increase in leakage power dissipation. With deep-submicron and nanometer technologies, the leakage current becomes more critical in portable systems where battery life is of prime concern. We have presented an efficient design methodology for reducing the leakage power in CMOS circuits. LECTOR yields better leakage reduction as the threshold voltage decreases and hence aids in further reduction of supply voltage and minimization of transistor sizes. Unlike other leakage control techniques, LECTOR does not need any control circuitry to monitor the states of the circuit. Hence, LECTOR avoids the sacrifice of obtained leakage power reduction in the form of dynamic power consumed by the additional circuitry to control the overall circuit states.

## REFERENCE[S]

- [1] "Control Gate Technique for Low Power Consumption in Buffer Chains". Malek TEIB, Alexandre MALHERBE , Edith KUSSENER.
- [2] "LECTOR: A Technique for Leakage Reduction in CMOS Circuits". Narender Hanchate, Nagarajan Ranganathan.
- [3] "Logical effort based Power-Delay-Product Optimization". Sachin Maheshwari, Jimit Patel, Sumit K. Nirmalkar1, Anu Gupta.
- [4] "Leakage Power Reduction in CMOS Logic Circuits Using Stack ONOFIC Technique". Chanchal Kumar, Avinash Sharan Mishra and Vijay Kumar Sharma.

## Contribution of work by different group members :-

- 1) Bhimaraddy B Y (201EC170) : **Hours worked = 17-18 hrs**
  - a) Worked on inverter design and 2 bit comparator design with 45 nm and 180 nm technology node. Implemented the both simple and proposed designs in ngspice and calculated the power consumption.
  - b) Drew the layout of the proposed design of the inverter in the magic layout tool and simulated the layout extracted list.
  - c) In inverter simulation calculated variation of power consumption for different VDD values as shown in the report.
  - d) Verification of power of multi Vth design in case of LECTOR leakage reduction technique.
  - e) Making the report on above mentioned topics.
- 2) Sampath N (201EC158) : **Hours worked = 12-14 hrs**
  - a) Worked on the inverter chain of length 3 and 5 driving 1pf load capacitance.
  - b) Implemented the both simple and proposed design of inverter chain of length 3 and 5 in ngspice and drew the layout of proposed design of length 3 and 5 in magic tool.
  - c) Simulated the layout extracted netlist and calculated the total power consumed and delay values for both the designs with 45nm and 180nm technology node.
  - d) Verified the buffer insertion strategy to increase the performance and importance of effective fanout of four (FO4).
  - e) Making the report on above mentioned topics.
- 3) Rohit Kumar (201EC154): **Hours worked = 12-14 hrs**
  - a) Worked on the implementation of NAND2 gate in ngspice. Wrote the code for both simple and two proposed designs for both 45nm and 180nm technology node.
  - b) Simulated the code and calculated the total power consumption in each case.
  - c) Also calculated the delay values and that includes the best case, worst case, average case values because we know that propagation delay values are dependent on input combination.
  - d) Making the report on the above topics.

Gdrive link of our work :

[https://drive.google.com/drive/folders/11qAT44WIUtKsHCgQ14cwma8q11EHqNv?usp=share\\_link](https://drive.google.com/drive/folders/11qAT44WIUtKsHCgQ14cwma8q11EHqNv?usp=share_link)

# Control Gate Technique for Low Power Consumption in Buffer Chains

Malek TEIB<sup>\*†‡</sup>, Alexandre MALHERBE <sup>\*§</sup>, Edith KUSSENER <sup>†¶</sup>

<sup>\*</sup>STMicroelectronics, Rousset, France

<sup>†</sup>Aix-Marseille University, IM2NP UMR 7334, Marseille, France

Email: <sup>‡</sup>malek.teib@st.com, <sup>§</sup>alexandre.malherbe@st.com, <sup>¶</sup>edith.kussener@cciamp.com

**Abstract**—High buffer chains consumption is a serious impediment, particularly for low power and energy harvesting systems. In this paper, we present a novel approach named CGTx (Control Gate Transistor) for designing CMOS buffer chains. We introduce two gate control transistors that separate and regulate the pull-up and pull-down transistor gates. This configuration reduces short-circuit leakage which helps to reduce total power consumption. This proposed technique is compared to the LECTOR (LEakage Control TransistorOR) technique and conventional inverter gate in terms of leakage power during switching for a single inverter gate and total power consumption for a chain of six inverters. The total consumption for a chain of six inverters with high drive capacitance using CGTx inverters is 4% lower compared to conventional inverters and 28% lower compared to LECTOR technique.

**Keywords**—Consumption optimization, Leakage power, Gate control, Buffers, driving capacitance

## I. INTRODUCTION

The constant trend towards smaller, more minimalist and efficient wearables need an effective power management system design. A wearable device's compact size may make it impossible to employ big batteries and requires the deployment of creative power management strategies. An often debated approach is the use of technologies that harvest energy from external sources to charge the device. Energy harvesting makes low-power systems more self-sufficient and allows the possibility of battery-free systems development [1]. In order to meet the target application's power requirements, the amount of consumed energy by the harvesting circuit must be significantly lower than energy harvested from the source. Thus, power extraction and transfer are critical design factors that influence the overall efficiency of the power management system for energy harvesting.

One component of energy harvesting power management system is power converter that raises the output voltage of the energy transducer to a suitable level. The power converter can be implemented using either an inductive boost converter or a charge pump [2]. To ensure energy transmission, boost converters and charge pumps both need an external clock signal, leading to the use of power buffers to drive charged capacitors. However, to drive large capacitive load, a chain of buffers is needed to increase its drain current strength. Which results in a significant power leakage through buffers that grows exponentially with inverter's size. Thus, reducing

buffers leakage power is extremely necessary, especially for harvesting systems as we need to save as much harvested energy as possible.

The focus of this paper is, therefore, to reduce buffers power consumption using a new technique for designing CMOS inverter gate circuits called CGTx (Control Gate Transistor). Section II provides artworks on different leakage power control and reduction methods. Section III introduces our design strategy and approach for minimizing buffers power consumption. Results of the proposed circuit and comparison with other works are reported in section IV. Finally, the paper concludes in section V.

## II. STATE OF THE ART

The literature has described a variety of leakage power control and minimization approaches [3]. The simplest is Stack technique which consists in stacking pull-up and pull-down transistors considering that two or more in series transistors, when turned off together, have significantly less leakage than a single transistor. Sleep transistor technique is another way for limiting leakage power using sleep transistor between  $VDD$  and pull-up transistor or/and between pull-down transistor and  $GND$ . The sleep transistor switches off the circuit by shutting down the power rails in idle mode which can significantly reduce leakage power. High threshold sleep transistor can be used, as described by the multiple threshold voltage CMOS (MTCMOS) technique in [4], in order to achieve larger leakage reduction and maintain fast logic switching speed.

More sophisticated leakage power approaches include the Sleepy Stack Transistor technique [5], which combines forced stack and sleep transistor methods, the Sleepy Keeper methodology, which uses feedback leakage, and the Variable Body Biasing strategy. The work in [3] presents and compares the efficiency of the discussed techniques.

Another approach for leakage power reduction is LECTOR technique in which two leakage control transistors (LCT), PMOS and NMOS, are connected between the pull-up and pull-down transistors, with the gate terminals of each leakage control transistor controlled by the other transistor's source as represented in Fig. 1. The LECTOR technique can be seen as self-controlled stacked transistors since each LCT is controlled by the source of the other LCT. These two LCTs increases resistance between  $VDD$  and  $GND$ , which minimizes leakage current [6].



Fig. 1. Inverter structure with LECTOR technique

### III. CONTROL GATE TECHNIQUE

#### A. Preliminaries

The total power consumption of a CMOS inverter can be expressed as the sum of four components as described in equation 1.

$$P_{tot} = P_{dyn-switch} + P_{dc-leak} + P_{stat-leak} + P_{th} \quad (1)$$

$P_{dyn-switch}$  is the dynamic switching power provided by Load capacitance,  $C_L$ , charging through the Pull-up transistor at each transition. It can be expressed as  $P_{dyn-switch} = C_L VDD^2 f/2$ , where  $f$  is the switching frequency.

$P_{dc-leak}$  is the dc leakage, known also as Short-circuit power dissipation. It is caused by the current flow through the direct path existing between  $VDD$  and  $GND$  terminals when pull-up and pull-down transistors conduct at the same time during a switching transition. The aim of this paper is to reduce dc power leakage.

$P_{stat-leak}$  and  $P_{th}$  are referred to as a static power dissipation, since  $P_{stat-leak}$  is generated by the current flow,  $I_{stat}$ , in absence of switching activity and  $P_{th}$  is caused by temperature-increased leakage current. They both depend on transistor geometry, process, technology node and temperature.

#### B. Control Gate Technique operation

The basic idea behind this approach is to create a delay on the input of the logic gate so that the pull-up and pull-down transistors do not switch at the same time so that both devices are never on simultaneously preventing short-circuit leakage. Thus, a control transistor is added on the gate of each driving transistor separating the input of the inverter as represented in Fig. 5(a). Therefore the generated delay corresponds to  $R_{on}$  of the Control Gate transistor multiplied by  $C_{gb}$  capacitor of the pull-up or pull-down transistor.  $C_p$  and  $C_n$  signals control the gates of transistors GCTP and GCTN respectively so that MP and MN doesn't switch at the same time.



Fig. 2. CGTx approach

To avoid having extra control signals, we can use MP and MN gate nodes to control the switching of GCTN and GCTP respectively as represented in Fig. 3(a). The chronograph in Fig. 3(b). explains the functioning of this configuration for an inverter gate. In one transient phase (U1), when IN turns to logic 1, GCTP is ON, so GP gets logic 1. At this moment (t2), MP turns OFF and GCTN controlled by GP turns ON, so GN gets its logic 1 ( $VDD - Vtn$ ) and therefore MN turns ON. Thus we get the reversed output. On the other phase (U2), when IN falls to logic 0, since GCTN is ON, GN gets logic 0 so MN transistor turns OFF and GCTP turns ON. Therefore GP gets to  $(0 + Vtp)$  which results to turn ON MP and make GCTN to its near cut off region. The state of all transistors during the rise and fall transient phases are detailed in Table I.



Fig. 3. (a) Inverter structure with CGTx, (b) CGTx inverter functioning

TABLE I  
STATE MATRIX OF GATE CONTROLLED INVERTER

| Transistor | State 1  | Transient 1 (U1) |          |     |          |          | State 2  | Transient 2 (U2) |          |     |          |          |
|------------|----------|------------------|----------|-----|----------|----------|----------|------------------|----------|-----|----------|----------|
|            |          | t0               | t1       | t2  | t3       | t4       |          | t5               | t6       | t7  | t8       | t9       |
| GCTP       | on       | on               | on       | on  | Near off | Near off | Near off | Near off         | Near off | on  | on       | on       |
| GCTN       | Near off | Near off         | Near off | on  | on       | on       | on       | on               | on       | on  | Near off | Near off |
| MP         | on       | on               | on       | off | off      | off      | off      | off              | off      | off | on       | on       |
| MN         | off      | off              | off      | off | on       | on       | on       | on               | on       | off | off      | off      |

With this wiring configuration, one of the GCTs is always near to its cut off region. This is what can be seen from the



Fig. 4. Simulation results of CGTx inverter

simulation shown in Fig. 4. When  $V_{in} = 1\text{ V}$ , the voltage at the node GN is about 700 mV which corresponds to  $V_{DD} - V_{th}$ ,  $V_{th}$  is the threshold of GCTs transistors. This voltage is not sufficient to turn GCTP completely to its OFF state. Similarly, when  $V_{in} = 0\text{ V}$ , the voltage at GP is at 300 mV ( $= V_{th}$ ) operating GCTN near cut off region.

This drive loss on GP and GN nodes, which drive MP and MN respectively, results in a drop in drive current for the case of a buffer. Therefore we propose the configuration represented in Fig. 5. to force the cut off of the control gate transistors GCTP2 and GCTN2.

M1 is ON when GP1 is at  $V_{th}$  and M2 is ON when OUT2 is at 0 to ensure that GN2 goes to 1 and blocks completely GCTP2 transistor. M5 and M6 transistors are ON to prevent a reverse current in GCTP2 during its switching transition.

Symmetrically, M3 is ON when GN1 is at  $V_{DD} - V_{th}$  and M4 is ON when OUT2 goes to 1 which forces GP2 to 0 to completely turn off GCTN2. M7 and M8 prevent a reverse current in GCTN2 during its switching.



Fig. 5. CGTx buffer circuit

#### IV. RESULTS AND DISCUSSION

Our proposed idea is compared to conventional inverter gate and LECTOR inverter gate. These evaluations have been done by Eldo simulator in 40nm CMOS technology at 1 MHz input frequency.

Fig. 6. depicts the lost electrical charges in pull-up and pull-down transistors from power supply voltage ( $V_{DD}$ ) during rise and fall transitions for our proposed Gate Control Technique, LECTOR Technique and conventional inverter gates. Fig. 6(a) shows output voltage and the lost charge quantity in NMOS during rising transition and Fig. 6(b) shows output voltage and lost charge quantity in PMOS during falling transition. The lost charge quantity is obtained by integrating the current on the drains of NMOS and PMOS respectively. During switching, the lost current for LECTOR inverter gate and Conventional inverter gate are higher. CGTx inverter gate has the least lost current in both rising and falling transitions.



Fig. 6. Switching Voltage and Lost electrical charges (a) in NMOS, (b) in PMOS

The main goal of the work described in this paper, is to reduce power consumption of CMOS buffers driving high capacitance using circuit-level method. Therefore, we studied total power consumption for 6 in series inverters and 6 in series inverters with additional inverter drive capacitance, as presented respectively in Fig. 7(a) and Fig. 7(b).  $W_p$  and  $W_n$  are identical for all PMOS and NMOS of the buffers chain with  $W_p/W_n$  ratio equal to 3.

It is important to note here that in the case of our proposed CGTx technique, we used the configuration of Fig. 5 for I5 and I6 of the structures in Fig. 7 to avoid drive loss.

Simulation results of total charge quantity consumption during  $20\text{ }\mu\text{s}$  as a function of load capacitance for  $V_{DD} = 1\text{ V}$  and  $V_{DD} = 2\text{ V}$  are reported in Fig. 8.

As shown in Fig. 8(a), when  $V_{DD} = 1\text{ V}$ , the total charges consumption for CGTx technique and conventional inverter technique are almost identical with a slight increase in the CGTx technique's consumption for higher load capacitance.

While the LECTOR approach consumes more power, it is unable to drive loads with capacitance over  $200 \text{ fF}$ , for  $VDD = 1 \text{ V}$ . When  $VDD = 2 \text{ V}$ , we obtain the same behaviour of power consumption for CGTx and conventional inverter techniques as when  $VDD = 1 \text{ V}$ . Lector technique inverter chain is capable to drive a larger load capacitance with  $VDD = 2 \text{ V}$ . However, it consumes more power than the other two techniques.

The simulated total power consumption for 6x inverters chain with increased single-inverter drive capacitance is shown in Fig. 8(b). Regardless of  $VDD = 1 \text{ V}$  or  $VDD = 2 \text{ V}$ , the CGTx approach has the lowest overall power consumption whereas the LECTOR technique has a much higher consumption.

If we apply the CGTx technique to a chain of six inverters with high drive capacitance during  $20 \mu\text{s}$  with  $1 \text{ MHz}$  switching frequency, the total power consumption is 4% lower than conventional inverters and 28% lower than LECTOR technique. Therefore, the proposed CGTx method shows its effectiveness in power consumption reduction of CMOS buffer chains driving high capacitance.



Fig. 7. Simulated structures: (a) 6x inverters, (b) 6x inverters with additional drive capacitance



Fig. 8. Total consumed quantity of electricity during  $20 \mu\text{s}$  for (a) 6x inverters, (b) 6x inverters with additional drive capacitance

## V. CONCLUSION

In energy harvesting power management systems, boost converters are essential components as they increase the output voltage of the energy transducer to a suitable level. To ensure energy transfer, these boost converters require a clock signal and large switching capacitors. Therefore, a chain of buffers must be employed to drive large capacitive loads, resulting in significant buffers power consumption. As a result, reducing the power consumption of buffers is extremely desirable in low-power and energy-harvesting systems. In this work, we present a new circuit structure named CGTx (Control Gate Transistor) technique to reduce leakage power in buffers. On the gate of each driving transistor, a control transistor is added to generate a delay on the input of the logic gate so that the pull-up and pull-down transistors do not switch at the same time, reducing short-circuit leakage. This implementation is compared to conventional inverter gate and LECTOR method inverter gate. During switching, leakage currents on the drains of LECTOR inverter gate and Conventional inverter gate are larger since the Pull-up and Pull-down transistors are both in saturation region. CGTx inverter gate has the least leakage as pull-up and pull-down transistors do not switch at the same time. The proposed CGTx technique is a promising implementation for power consumption reduction of CMOS buffer chains driving high capacitance. The total consumption for a chain of six inverters with high drive capacitance using CGTx inverters during  $20 \mu\text{s}$  with  $1 \text{ MHz}$  switching frequency at  $VDD = 1 \text{ V}$  is 4% lower compared to conventional inverters and 28% lower compared to LECTOR technique.

## REFERENCES

- [1] K. Lundager, B. Zeinali, M. Tohidi, J. K. Madsen, and F. Moradi, "Low power design for future wearable and implantable devices," *Journal of Low Power Electronics and Applications*, vol. 6, no. 4, 2016.
- [2] F. Quintero, R. Rafael, Flores-Verdad, and G. Espinosa, "A fully integrated parallel stages converter for thermal energy harvesting," *Analog Integrated Circuits and Signal Processing*, vol. 103, pp. 95–101, 2020.
- [3] A. Yadav and R. Kumar, "Low power design techniques for reduction of leakage power in cmos vlsi circuits using modified sleepy keeper," *IJECT*, vol. 6, 2015.
- [4] M. W. Rahman Khan and M. Mohiuddin Uzzal, "Multi-threshold cmos devices: A comparative analysis of leakage power and delay in digital circuits for nano-scale technology," in *2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)*, 2019, pp. 1–6.
- [5] J. C. Park and V. J. Mooney III, "Sleepy stack leakage reduction," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 14, no. 11, pp. 1250–1263, 2006.
- [6] N. Hanchate and N. Ranganathan, "Lector: a technique for leakage reduction in cmos circuits," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 12, no. 2, pp. 196–205, 2004.

# LECTOR: A Technique for Leakage Reduction in CMOS Circuits

Narendra Hanchate, *Student Member, IEEE*, and Nagarajan Ranganathan, *Fellow, IEEE*

**Abstract**—In CMOS circuits, the reduction of the threshold voltage due to voltage scaling leads to increase in subthreshold leakage current and hence static power dissipation. We propose a novel technique called LECTOR for designing CMOS gates which significantly cuts down the leakage current without increasing the dynamic power dissipation. In the proposed technique, we introduce two leakage control transistors (a p-type and a n-type) within the logic gate for which the gate terminal of each leakage control transistor (LCT) is controlled by the source of the other. In this arrangement, one of the LCTs is always “near its cutoff voltage” for any input combination. This increases the resistance of the path from  $V_{dd}$  to ground, leading to significant decrease in leakage currents. The gate-level netlist of the given circuit is first converted into a static CMOS complex gate implementation and then LCTs are introduced to obtain a leakage-controlled circuit. The significant feature of LECTOR is that it works effectively in both active and idle states of the circuit, resulting in better leakage reduction compared to other techniques. Further, the proposed technique overcomes the limitations posed by other existing methods for leakage reduction. Experimental results indicate an average leakage reduction of 79.4% for MCNC’91 benchmark circuits.

**Index Terms**—Deep submicron, leakage power, power optimization, transistor stacking.

## I. INTRODUCTION

POWER dissipation is an important consideration in the design of CMOS VLSI circuits. High power consumption leads to reduction in the battery life in the case of battery-powered applications and affects reliability, packaging, and cooling costs. The main sources for power dissipation are: 1) capacitive power dissipation due to the charging and discharging of the load capacitance; 2) short-circuit currents due to the existence of a conducting path between the voltage supply and ground for the brief period during which a logic gate makes a transition; and 3) leakage current. The leakage current consists of reverse-bias diode currents and subthreshold currents. The former is due to the stored charge between the drain and bulk of active transistors while the latter is due to the carrier diffusion between the source and drain of the OFF transistors.

The short-circuit power dissipation can be reduced to 10% of total power dissipation by designing the circuit to have equal input and output rise/fall edge times [1]. The power dissipation resulting from switching activity is the dominant component

Manuscript received December 11, 2002; revised August 20, 2003. This paper was recommended by Associate Editor Dr. R. Secareanu.

The authors are with the Nanomaterials and Nanomanufacturing Research Center, Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620 USA (e-mail: narendr@csee.usf.edu; ranganat@csee.usf.edu).

Digital Object Identifier 10.1109/TVLSI.2003.821547



Fig. 1. Voltage supply and threshold voltage scaling trend.

for technology processes with feature size larger than 1  $\mu\text{m}$ . With technology processes maturing toward the deep-submicron regime, the feature sizes of the transistors are becoming smaller, thereby reducing the load capacitances. The reduction in feature size also forces a reduction in the supply voltage. The voltage scaling technique takes benefit of the quadratic dependence of switching power on supply voltage for dynamic power savings. However, this technique pays a penalty for the operation of the circuit by increasing the delay drastically as supply voltage approaches the threshold voltage  $V_t$  of the devices [2]. In order to facilitate voltage scaling without affecting the performance, threshold voltage has to be reduced. In general, the ratio between the supply voltage and the threshold voltage should be at least 5, so that the performance of CMOS circuits is not affected [3]. This also leads to better noise margins and helps to avoid the hot-carrier effects in short-channel devices [4].

Scaling down of threshold voltage  $V_t$  results in exponential increase of the subthreshold leakage current [5]. The supply voltage and threshold voltage scaling trends for Intel’s microprocessor process technologies are discussed in [6]. It can be seen from Fig. 1 that the leakage power is only 0.01% of the active power for 1- $\mu\text{m}$  technology, while it is 10% of the active power for 0.1- $\mu\text{m}$  technology. There is a fivefold increase in leakage power as the technology process advances to a new generation. Projecting these trends, it can be seen that the leakage power dissipation will equal the active power dissipation within a few generations. Hence, efficient leakage power reduction methods are very critical for the deep-submicron and nanometer circuits.

In this paper, we describe a new leakage power reduction technique called LECTOR (LEakage Control Transistor) for designing CMOS circuits. The rest of the paper is organized as follows. Section II describes briefly the prior works on leakage power reduction and their limitations. Section III introduces the transistor models used for estimating the leakage power. Our design strategy and an approach for minimizing the area overhead

are described in Sections IV and V, respectively. Results are presented in Section VI, followed by conclusions in Section VII.

## II. RELATED WORK

Numerous methods for leakage power control have been reported in the literature. The work in [7] makes use of the dependence of the leakage current on the input vector to the gate. With additional control logic, the circuit is put into a low-leakage standby state when it is idle and restored to the original state when reactivated. Reactivation state forces the need to remember the original state information before going to low-leakage standby state. This requires special latches, thereby increasing the area of the circuit by about five times in the worst case [8]. Also, the amount of time for which the unit remains in idle state should be long enough so that the dynamic power consumed in forcing the circuit to low-leakage state and the leakage power dissipated in the standby state together is less than the leakage power without the technique.

Another technique for leakage power control is power gating, which turns off the devices by cutting off their supply voltage [9], [10]. This technique makes use of a bulky NMOS and/or PMOS device (sleep transistor) in the path between the supply voltage and ground. The sleep transistor is turned on when the circuit is active and turned off when the circuit is in idle state with the help of sleep signal. This creates virtual power and ground rails in the circuit. Hence, there is a significant detrimental effect on the switching speed when the circuit is active. The identification of the idle regions of the circuit and the generation of the sleep signal need additional hardware capable of predicting the circuit states accurately. This additional hardware consumes power throughout the circuit operation even when the circuit is in an idle state to continuously monitor the circuit state and control the sleep transistors.

The use of multiple threshold voltage CMOS (MTCMOS) technology for leakage control is described in [11] and [12]. The transistors of the gates are at low threshold voltage and the ground is connected to the gate through a high-threshold voltage NMOS gating transistor. The logical function of a gating transistor is similar to that of a sleep transistor. The existence of reverse conduction paths tend to reduce the noise margin or in the worst case may result in complete failure of the gate [4]. Moreover, there is a performance penalty since high-threshold transistors appear in series with all the switching current paths. A variation of MTCMOS technique is the Dual  $V_t$  technique, which uses transistors with two different threshold voltages. Low-threshold transistors are used for the gates on the critical path and high-threshold transistors are used for those not in the critical path [4], [13], [14]. In both MTCMOS and Dual  $V_t$  methods, additional mask layers for each value of threshold voltage are required for fabricating the transistors selectively according to their assigned threshold voltage values. This makes the fabrication process complex.

In addition to these limitations, the techniques discussed above suffer from turning-on latency, that is, when the idle subsections of the circuit are reactivated, they cannot be used immediately because some time is needed before the subcircuit returns to its normal operating condition. The latency for power

gating is typically a few cycles, and for Dual  $V_t$  technology, is much higher [15]. Also, these techniques are not effective in controlling the leakage power when the circuit is in active state.

In [16], the authors use the concept of forced stacks for leakage control. Forced stacking introduces an additional transistor for every input of the gate in both N- and P-networks. This ensures that two transistors are OFF instead of one for every OFF-input of the gate and hence makes a significant savings on the leakage current. However, the loading requirements for each input introduced by the forced stacking reduces the drive current of the gate significantly. This results in a detrimental impact on the speed of the circuit.

In [10], a blend of sleep transistors and the stacking effects are used to reduce leakage power. This method identifies a circuit input vector for which the leakage current of the circuit is the lowest possible. The sleep signal controlled transistors are inserted away from the critical path where only one transistor is OFF when low-leakage input vector is applied to the circuit. Hence, this technique is input-vector dependent. Moreover, as this technique uses sleep transistors, it needs additional hardware for controlling them. This additional hardware consumes power in both idle and active states of the circuit.

In this work, we develop a new technique for leakage control in CMOS circuits. The proposed technique avoids the problems associated with the techniques discussed above.

## III. PRELIMINARIES

In this section, we briefly describe the models used in this work for estimating power dissipation for short-channel MOSFETs. The leakage current calculation is not straightforward due to the highly nonlinear behavior of the drain current of the device with respect to source/drain voltages. We have used the Berkeley Short-Channel IGFET (BSIM) Predictive Technology Model to estimate the leakage power dissipation, as it fits well with HSPICE simulations [5]. In the BSIM model, the threshold voltage  $V_t$  is expressed as

$$V_t = V_{FB} + \phi_s + k_1 \sqrt{\phi_s} - k_2 \phi_s - \eta V_{dd} \quad (1)$$

where  $V_{FB}$  is the flatband voltage,  $\phi_s$  is twice the Fermi potential,  $k_1$  and  $k_2$  represent the nonuniform doping effect, and  $\eta$  models the drain-induced barrier lowering (DIBL) effect, an undesirable punch-through current flowing between the source and drain below the surface of the channel. The leakage current for NMOS transistors operating in weak-inversion region (i.e.,  $V_{gs} = 0$ ) is given by

$$I_s = I_0 \exp\left(\frac{(V_{gs} - V_t)}{nV_T}\right) \left(1 - \exp\left(-\frac{V_{ds}}{V_T}\right)\right) \quad (2)$$

where  $V_T$  is the thermal voltage and is given by  $q/kT$ ,  $n$  is the subthreshold slope coefficient, and  $I_0 = \mu_0 C_{ox} (W_{eff}/L_{eff}) V_t^2 e^{1.8}$ . Equation (2) gives a simple method for estimating the leakage current in a single NMOS transistor. A similar expression for the leakage current in a single PMOS transistor can be obtained.

In most CMOS logic design styles, the gates consist of series-parallel networks of PMOS and NMOS transistors. The



Fig. 2. Two input LCT NAND gates. The leakage control transistors  $LCT_1$  and  $LCT_2$  are inserted between nodes  $N_1$  and  $N_2$  and they act as self-controlled stacked transistors.

leakage current contributed by the MOS transistors connected in parallel is the sum of the currents through the individual transistors. However, in the case of transistors connected in series, the evaluation of leakage current becomes complex due to its nonlinear characteristics. The analysis of the current through a stack of transistors when they operate in subthreshold region is as given in [3]:

$$I_{s1}:I_{s2}:I_{s3} = 1.8 \exp\left(\frac{\eta V_{dd}}{nV_t}\right) : 1.8:1 \quad (3)$$

where  $I_{si}$  ( $i = 1, 2, 3$ ) is the leakage current for  $i$  stacked MOS transistors and  $I_{s1}$  is given by (2). Equation (3) shows that the leakage current of the MOS network can be expressed as a function of a single MOS transistor. If the number of stacked MOS transistors is more than three, the leakage current is very small and can be neglected.

#### IV. LECTOR TECHNIQUE

The basic idea behind our approach for reduction of leakage power is the effective stacking of transistors in the path from supply voltage to ground. This is based on the observation made in [10], [16], [17] that “a state with more than one transistor OFF in a path from supply voltage to ground is far less leaky than a state with only one transistor OFF in any supply to ground path.” In our method, we introduce two leakage control transistors (LCTs) in each CMOS gate such that one of the LCTs is *near its cutoff region* of operation. We illustrate our LEakage Control TransistOR technique (LECTOR) with the case of a NAND gate. A CMOS NAND gate with the addition of two leakage control transistors is shown in Fig. 2 (we later refer to it as the LCT NAND gate).



Fig. 3. DC characteristics of two-input NAND gate. (a) Characteristics of basic NAND gate. (b) Characteristics of LCT NAND gate. It can be observed that the output voltage variation is similar in both cases.

Two leakage control transistors  $LCT_1$  (PMOS) and  $LCT_2$  (NMOS) are introduced between the nodes  $N_1$  and  $N_2$  of the pull-up and pull-down logic of the NAND gate. The drain nodes of the transistors  $LCT_1$  and  $LCT_2$  are connected together to form the output node of the NAND gate. The source nodes of the transistors are connected to nodes  $N_1$  and  $N_2$  of pull-up and pull-down logic, respectively. The switching of transistors  $LCT_1$  and  $LCT_2$  are controlled by the voltage potentials at nodes  $N_2$  and  $N_1$  respectively. This wiring configuration ensures that one of the LCTs is always *near its cutoff region*, irrespective of the input vector applied to the NAND gate. This can be seen from the dc characteristics shown in Fig. 3, obtained from HSPICE simulations. Fig. 3(a) shows the dc characteristics of the unmodified NAND gate and Fig. 3(b) shows that of the LCT NAND gate, when the input  $A_{in}$  is fixed at 1 V and  $B_{in}$  is varied from 0 to 1 V.

Consider the dc characteristics of the LCT NAND gate. When  $A_{in} = 1$  V and  $B_{in} = 0$  V, the voltage at the node  $N_2$  is 800 mV. This voltage is not sufficient to turn  $LCT_1$  completely to

TABLE I  
STATE MATRIX OF TWO-INPUT LCT NAND GATE

| Transistor | Input Vector - ( $A_{in}, B_{in}$ ) |                    |                    |                    |
|------------|-------------------------------------|--------------------|--------------------|--------------------|
|            | (0,0)                               | (0,1)              | (1,0)              | (1,1)              |
| $M_1$      | On state                            | On state           | Off state          | Off state          |
| $M_2$      | On state                            | Off state          | On state           | Off state          |
| $LCT_1$    | Near Cut-Off state                  | Near Cut-Off state | Near Cut-Off state | On state           |
| $LCT_2$    | On state                            | On state           | On state           | Near Cut-Off state |
| $M_3$      | Off state                           | Off state          | On state           | On state           |
| $M_4$      | Off state                           | On state           | Off state          | On state           |



Fig. 4. Transient characteristics of two-input LCT NAND gate using HSPICE. (The x axis shows simulation time in nanoseconds and the y axis shows the voltage levels in millivolts).

OFF state. Hence, the resistance of  $LCT_1$  will be lesser than its OFF resistance, allowing conduction. Even though the resistance of  $LCT_1$  is not as high as its OFF state resistance, it increases the resistance of  $V_{dd}$  to ground path, controlling the flow of leakage currents, resulting in leakage power reduction. Similarly, when  $A_{in} = 1$  V and  $B_{in} = 1$  V, the voltage of node  $N_1$  is 200 mV, operating the transistor  $LCT_2$  near its *cutoff* region. The state of the transistors for all possible combinations of input vectors for the LCT NAND gate are tabulated in Table I. Thus, the introduction of LCTs increases the resistance of the path from  $V_{dd}$  to ground. This also increases the propagation delay of the gate. To reduce this hostile effect, the transistors of LCT gate are sized such that the propagation delay is equal to its conventional counter part.

TABLE II  
THRESHOLD VoltAGES OF MOS MODELS USED

| Technology Process | Threshold Voltage of NMOS ( $V_{tn}$ ) | Threshold Voltage of PMOS ( $V_{tp}$ ) |
|--------------------|----------------------------------------|----------------------------------------|
| 70nm               | 0.2V                                   | -0.22V                                 |
| 100nm              | 0.2607V                                | -0.303V                                |
| 130nm              | 0.332V                                 | -0.3499V                               |
| 180nm              | 0.3999V                                | -0.42V                                 |

Fig. 4 shows the transient curves obtained by HSPICE at various nodes of the LCT NAND gate simulated for 70-nm technology at 1-V supply voltage. It can be observed from the curves that the LCT NAND produces exact output logic levels. We have also tested the LECTOR with process variations of 180, 130, and 100 nm technologies at 1-V supply voltage. The threshold voltages used by these BSIM predictive models are tabulated in Table II. We have observed that LECTOR produces slightly weak logic levels (around 960 mV of logic high and 35 mV of logic zero) for 180-nm and 130-nm process technologies. But it produces exact logic levels for 70-nm and 100-nm process variations. The reason for this behavior of LECTOR could be the difference between the supply voltage and the threshold voltage of a process variation. When we increase the supply voltage to 1.8 V, LECTOR produces exact logic levels for all the four process variations.

The transistors of the LCT NAND gate are sized to study its effect on the propagation delay of the gate. Y-LCT gates are obtained by sizing the widths of only the leakage control transistors  $LCT_1$  and  $LCT_2$  to  $Y$  times the width of other PMOS and NMOS transistors of the gate, respectively. Also, we have tried to adjust the widths of all the transistors of the LCT NAND gate such that its propagation delay is almost equal to that of a conventional NAND gate. (We refer to this gate as Iso-LCT NAND gate.) The widths of PMOS and NMOS transistors, except LCT transistors, used in simulation of Y-LCT gates are set to two and three times the minimum feature width of the respective process variation.

The leakage power dissipation and propagation delay values for two-input NAND gate using 100-nm and 70-nm technology processes are tabulated in Table III. It can be observed from the Tables II and III that as the threshold voltage decreases, which is the case as we move toward deep submicron, our technique produces better leakage power reductions. Hence, the use

TABLE III  
LEAKAGE POWER FOR TWO-INPUT NAND GATE

| 100nm Process Technology, Supply Voltage = 1V |                                                     |           |           |           |               |                   |
|-----------------------------------------------|-----------------------------------------------------|-----------|-----------|-----------|---------------|-------------------|
| NAND gate type                                | Leakage Power Dissipation in Watts for Input Vector |           |           |           | Delay in (ps) | % Average Savings |
|                                               | (0,0)                                               | (0,1)     | (1,0)     | (1,1)     |               |                   |
| Conventional                                  | 1.228e-10                                           | 9.117e-10 | 5.356e-10 | 2.241e-09 | 13.53         | -                 |
| LCT                                           | 1.180e-10                                           | 5.542e-10 | 4.477e-10 | 1.539e-09 | 18.79         | 30.20%            |
| 1.5-LCT                                       | 1.182e-10                                           | 5.788e-10 | 4.533e-10 | 1.593e-09 | 17.12         | 28.01%            |
| 2-LCT                                         | 1.194e-10                                           | 5.985e-10 | 4.606e-10 | 1.621e-09 | 15.64         | 26.54%            |
| 2.5-LCT                                       | 1.197e-10                                           | 6.114e-10 | 4.678e-10 | 1.651e-09 | 15.39         | 25.22%            |
| Iso-LCT                                       | 1.224e-10                                           | 6.394e-10 | 4.872e-10 | 1.704e-09 | 13.59         | 22.51%            |
| 70nm Process Technology, Supply Voltage = 1V  |                                                     |           |           |           |               |                   |
| Conventional                                  | 6.450e-10                                           | 5.600e-09 | 3.817e-09 | 1.091e-08 | 15.16         | -                 |
| LCT                                           | 6.065e-10                                           | 3.808e-09 | 3.622e-09 | 5.564e-09 | 21.40         | 35.12%            |
| 1.5-LCT                                       | 6.108e-10                                           | 3.900e-09 | 3.650e-09 | 5.742e-09 | 19.09         | 33.70%            |
| 2-LCT                                         | 6.146e-10                                           | 3.961e-09 | 3.667e-09 | 5.872e-09 | 18.36         | 32.71%            |
| 2.5-LCT                                       | 6.189e-10                                           | 4.016e-09 | 3.675e-09 | 6.060e-09 | 17.99         | 31.48%            |
| Iso-LCT                                       | 6.314e-10                                           | 4.125e-09 | 3.689e-09 | 6.196e-09 | 15.23         | 30.18%            |

Note: Y-LCT = LCT transistors are Y times the widths of their respective type.

of LECTOR enhances further reduction of the supply voltage along with the threshold voltage, thereby favoring technology advancements.

In the technique discussed in [10], the sleep transistors have to be able to isolate the power supply and/or ground from the rest of the transistors of the gate. Hence, they need to be made bulkier dissipating more dynamic power. This offsets the savings yielded when the circuit is idle. Moreover, this technique is input-vector dependent and needs additional circuitry to monitor and control the switching of sleep transistors. Thus, it consumes power in both active and idle states. In comparison, LECTOR is vector independent and the required control signals are generated within the gate. Forced stacks [16] have 100% area overhead. In comparison, LECTOR requires exactly two additional transistors for every path from supply voltage to ground irrespective of the logic function realized by the gate. The loading requirements of the gate with forced stacks are huge and depend on the number of additional transistors added. In comparison, the loading requirements with LCTs is much lower and is a constant. Hence, the performance degradation is insignificant in the case of LECTOR, and we overcome the major drawback faced by forced stack technique.

Table IV shows the effect of noise on various LCT gates and its corresponding conventional gates. Noise analysis was performed by creating a SPICE deck of the circuits. The experiment setup is shown in Fig. 5 and simulating using HSPICE. Our experiments are based on 70-nm process parameters and 25°C temperature. A switching waveform is applied at node *A* to generate a noise waveform on net *N*. The waveform on net *N* is denoted as  $X(t)$ , response on the fanout of net *N* as  $Y(t)$  and the transient sensitivity as  $S(t)$  [Fig. 5(a)]. If the waveform of the net *N* is slightly changed by a constant voltage *D*, i.e.,  $X_1(t) = X(t) + D$ , then the response of the fanout of net *N* is  $Y_1(t)$  [Fig. 5(b)]. The transient sensitivity  $S(t)$  is defined as

TABLE IV  
NOISE ANALYSIS OF VARIOUS GATES

| CMOS Gate | Conventional Gate |             | LCT Gate  |             |
|-----------|-------------------|-------------|-----------|-------------|
|           | Max Noise         | Sensitivity | Max Noise | Sensitivity |
| Inverter  | 0.3651            | -0.2744     | 0.4008    | -0.1517     |
| NAND2     | 0.2786            | -0.1986     | 0.3427    | -0.2025     |
| NOR2      | 0.4129            | -0.2241     | 0.4830    | -0.2543     |
| AOI22     | 0.3560            | -0.2840     | 0.3829    | -0.3642     |
| OAI22     | 0.3658            | -0.2864     | 0.4122    | -0.2938     |

the ratio of the change in the output voltage to the change in the input voltage when the input is changed by a very small dc offset, given by (4)

$$S(t) = \lim_{D \rightarrow 0} \frac{Y_1(t) - Y(t)}{X_1(t) - X(t)} = \lim_{D \rightarrow 0} \frac{Y_1(t) - Y(t)}{D}. \quad (4)$$

The sensitivity reported in Table IV is the maximum value of the transient sensitivity. The sensitivity gives a measure of how the fanout of net *N* reacts to the noise waveform on net *N* and indicates whether the noise is amplified or attenuated. If sensitivity of a net is greater than one, then the net is said to have a noise failure. It can be seen from Table IV that the LCT gates show stability to noise injections and the noise sensitivity values are very close to that of conventional gates. This is important to show that proposed LECTOR technique is stable to noise fluctuations.

## V. REDUCING AREA OVERHEAD IN IMPLEMENTATION OF LECTOR

A method to control the leakage power in a two-input NAND gate was described above. If we extend this idea to all the basic



Fig. 5. Setup used for performing the noise analysis using HSPICE.

logic gates, then the area overhead on the entire circuit will increase up to an upper bound of 60% depending on the logic gates used. The area overhead on various logic gates implemented using LCTs is tabulated in Table V. Here, we present a generalized approach which reduces the area overhead of the overall circuit along with significant leakage reduction.

Consider a four-input AND-OR-inverter (AOI22) gate. Fig. 6 shows two different implementations of the gate. The first implementation uses two two-input AND gates, one two-input OR gate, and an inverter, requiring a total of 20 transistors for its realization. A second implementation shows an equivalent compound/complex gate realization. A compound gate is formed by the combination of series and parallel MOS structures with complementary pull-up and pull-down logic [18]. The required series and parallel combinations of transistors are generated by analyzing the relevant Karnaugh map for both n- and p-logic structures and using DeMorgan's theorems. As these gates are built on static CMOS technology, they are called static CMOS complex gates (SCCG) [18]. SCCG implementation of an AOI22 gate needs eight transistors and hence is an efficient method for its realization. We have chosen to use SCCG gates for LECTOR as it not only saves area but also is well suited for inserting leakage control transistors. This is because SCCG gates have exactly one common node which connects the pull-up and pull-down circuits of the gate. Hence, in this gate, all the paths from the supply voltage to ground would pass through this node. Inserting LCTs at this node would control the leakage currents flowing through all the paths in the gate, thereby saving significantly.

TABLE V  
AREA OVERHEAD FOR A LCT GATE

| CMOS<br>Gate Type | Number of Transistors in |          |
|-------------------|--------------------------|----------|
|                   | UnModified Gate          | LCT Gate |
| 2-input NAND      | 4                        | 6        |
| 2-input NOR       | 4                        | 6        |
| 2-input AND       | 6                        | 10       |
| 2-input OR        | 6                        | 10       |
| 3-input NAND      | 6                        | 8        |
| 3-input NOR       | 6                        | 8        |
| 3-input AND       | 8                        | 12       |
| 3-input OR        | 8                        | 12       |



Fig. 6. Two implementations of AND-OR-inverter gate.

The limitation with SCCG gates is that if the number of transistors in series exceeds an upper limit in any path of pull-up or pull-down logic, then there is a hostile effect on the propagation delay of the gate. Typically, this upper bound can be safely fixed to three or even four transistors. The leakage power in the AOI22 gate can be controlled by adding the two LCTs between the pull-up and pull-down logic of the implementation shown in the Fig. 6(b). Fig. 7 shows the general scheme for converting any SCCG gate to leakage-controlled gate.

Any CMOS logic circuit can be expressed in terms of its gate-level netlist and hence is the starting point of our design methodology. Fig. 8 describes the sequence of steps for obtaining a leakage-controlled circuit from the gate-level netlist. The gate-level netlist is fed as input to a script, which generates the corresponding Berkeley Library Interchange Format (BLIF). The Sequential and Combinational circuit Synthesis System (SIS tool) [19] or the TRAnsistor Binding Using COmplex gates (TRABUCO) [20] tool can be used for performing technology mapping of the BLIF netlist. Technology mapping comprises



Fig. 7. Generalized structure for leakage-controlled gates.

of covering phase and matching phase. SIS is a technology-dependent mapping tool and hence uses a standard set of gates predefined in its library for performing technology mapping.

A library of SCCG gates with a maximum of two series transistors each was created. The constraint on the maximum number was induced in the topology for performance reasons. The covering phase in the SIS tool is performed by representing the netlist in the form of a tree called the parent tree. The matching is done by dividing the parent tree into subtrees. Each subtree is then replaced with the logic gate in the library whose tree representation matches efficiently [19].

TRABUCO, on other hand, is a technology-independent mapping tool, which maps the netlist on to a virtual library of SCCG gates. The tool creates virtual library of SCCG gates dynamically and hence does not need any library support. The set of SCCG gates used by the tool depends on the number of serial transistors input by the user. The covering scheme used by this tool is based on the dynamic tree covering approach similar to the 0-1-knapsack problem [20]. This approach generates a minimal number of multi-input SCCGs. The circuit with SCCG gates obtained by SIS or TRABUCO tool from the input netlist is then converted to a LECTOR-style circuit by introducing LCTs as shown in Fig. 7. The LECTOR circuit is then converted to SPICE format with the help of a script for simulation with HSPICE simulator.

## VI. EXPERIMENTAL RESULTS

The LECTOR technique was implemented and tested on MCNC'91 benchmark circuits. First, the MCNC'91 benchmark netlists were converted to BLIF using a script. SIS tool takes the BLIF input and performs technology mapping to SCCG gates. SIS tool provides various scripts, which aid in mapping the BLIF to the needs of the user. *Read\_blif* script was used for reading the BLIF netlist by the SIS tool. *Read\_library* script was invoked to read library of gates from a file in *genlib* format. SIS tool performs the covering phase of technology mapping by tree-covering algorithm and the logical decomposition of gates using the *tech\_decomp* script. The matching phase of



Fig. 8. General design flow.

the technology mapping was completed with the map script optimized for area.

The number of transistors in the path from  $V_{dd}$  to ground influences the operation of LCT gates. As the number of transistors in the path increases, the body effect becomes prominent, resulting in incorrect logic switching. So we have restricted the number of transistors on any path from  $V_{dd}$  to ground of LCT gates to six (three PMOS and three NMOS including the LCTs). Hence, SCCG library with gates constrained to two series transistors was used for generating LECTOR-style circuits. The eight different SCCG gates which can be realized with two series transistors are inverter, buffer, two-input NAND and NOR gates, two-input AOI and OAI gates (AOI22, AOI21, OAI22, OAI21). After mapping the BLIF to the SCCG library, these gates are replaced with the leakage-controlled gates, designed by the adding two LCTs to SCCG gates, to obtain a LCT MCNC Circuit. The addition of LCTs would make the number of series transistors in LCT gates to three.

In order to make a fair comparison with the circuit implemented without LECTOR, we created a library of gates containing at most three series transistors. This library contains an inverter, buffer, two- and three-input NAND, NOR, OR, AND gates, and two-input XOR gate. SIS tool is used to map the BLIF netlist on to this library to obtain an unmodified MCNC circuit (we refer to this as U-MC circuit later). We have analyzed the critical path delays of both LCT MCNC (referred to as LCT-MC) circuit and the corresponding U-MC circuit using HSPICE. The transistors of LCT gates present on the critical path of the LCT-MC circuit are sized such that the critical path delay is kept almost equal to that of the U-MC circuit. This helps to make a fair and direct comparison of LECTOR and U-MC circuits with respect to power dissipation. The HSPICE simulator was preferred for measuring the leakage power due to its accuracy. Simulations

TABLE VI  
EXPERIMENTAL RESULTS FOR MCNC '91 BENCHMARKS

| MCNC<br>'91<br>Circuit | Leak Pwr ( $\mu\text{W}$ )<br>in $\mu\text{W}$ |        | Dyn. Pwr (in $\mu\text{W}$ )<br>in $\mu\text{W}$ |        | Total Pwr ( $\mu\text{W}$ )<br>in $\mu\text{W}$ |        | Norma-<br>lized<br>area | % Leakage savings by                            |                                  |
|------------------------|------------------------------------------------|--------|--------------------------------------------------|--------|-------------------------------------------------|--------|-------------------------|-------------------------------------------------|----------------------------------|
|                        | U-MC                                           | LCT-MC | U-MC                                             | LCT-MC | U-MC                                            | LCT-MC |                         | Stacks [10] with<br>16% delay overhead          | LECTOR with<br>no delay overhead |
| I1                     | 1.159                                          | 0.156  | 1.122                                            | 1.094  | 1.275                                           | 0.291  | 1.21                    | 67.34%                                          | 86.54%                           |
| I2                     | 2.305                                          | 0.735  | 2.420                                            | 2.661  | 2.375                                           | 0.807  | 1.14                    | 61.61%                                          | 68.11%                           |
| I3                     | 1.383                                          | 0.419  | 1.063                                            | 1.025  | 1.451                                           | 0.712  | 1.18                    | 61.97%                                          | 69.70%                           |
| I4                     | 2.356                                          | 0.632  | 2.043                                            | 1.995  | 2.436                                           | 0.756  | 1.12                    | 55.03%                                          | 73.17%                           |
| I5                     | 4.625                                          | 0.475  | 5.905                                            | 5.945  | 5.473                                           | 0.732  | 1.19                    | 83.51%                                          | 89.72%                           |
| I6                     | 6.906                                          | 1.912  | 18.62                                            | 18.85  | 12.70                                           | 6.495  | 1.13                    | 60.74%                                          | 72.31%                           |
| I7                     | 8.933                                          | 3.126  | 25.62                                            | 25.59  | 17.95                                           | 9.248  | 1.12                    | 51.39%                                          | 65.01%                           |
| I8                     | 30.05                                          | 5.038  | 59.06                                            | 58.34  | 41.08                                           | 10.56  | 1.08                    | 74.32%                                          | 83.23%                           |
| I9                     | 21.90                                          | 2.897  | 38.79                                            | 37.24  | 25.75                                           | 8.982  | 1.12                    | 85.15%                                          | 86.77%                           |
| I10                    | 40.47                                          | 5.842  | 54.58                                            | 52.13  | 43.31                                           | 11.23  | 1.15                    | 83.39%                                          | 85.56%                           |
|                        |                                                |        |                                                  |        |                                                 |        |                         | Dual $V_T$ [14] with upto<br>64% delay overhead | LECTOR with<br>no delay overhead |
| C432                   | 1.395                                          | 0.672  | 6.970                                            | 6.889  | 1.923                                           | 1.260  | 1.17                    | 28.83%                                          | 51.82%                           |
| C499                   | 3.469                                          | 1.444  | 16.38                                            | 16.03  | 5.037                                           | 2.990  | 1.15                    | 22.96%                                          | 58.37%                           |
| C880                   | 6.141                                          | 1.154  | 9.472                                            | 9.502  | 7.972                                           | 2.493  | 1.18                    | 82.67%                                          | 81.21%                           |
| C1355                  | 8.089                                          | 1.672  | 16.08                                            | 15.93  | 9.754                                           | 3.104  | 1.11                    | 21.50%                                          | 79.32%                           |
| C1908                  | 19.61                                          | 1.926  | 28.43                                            | 27.10  | 24.53                                           | 3.613  | 1.13                    | 84.92%                                          | 90.18%                           |
| C2670                  | 52.17                                          | 2.845  | 70.45                                            | 69.59  | 65.70                                           | 4.338  | 1.19                    | 90.25%                                          | 94.54%                           |
| C3540                  | 64.79                                          | 3.852  | 81.65                                            | 76.49  | 73.41                                           | 4.588  | 1.14                    | 83.36%                                          | 94.05%                           |
| C5315                  | 82.58                                          | 4.826  | 99.21                                            | 97.60  | 88.50                                           | 5.892  | 1.12                    | 91.56%                                          | 94.16%                           |
| C6288                  | 163.7                                          | 9.725  | 520.4                                            | 502.2  | 390.2                                           | 30.23  | 1.10                    | 61.75%                                          | 94.05%                           |
| C7552                  | 323.2                                          | 10.24  | 576.2                                            | 566.6  | 393.3                                           | 22.96  | 1.08                    | 90.90%                                          | 96.86%                           |

The critical path delays of U-MC and LCT-MC circuits are made equal by sizing LCT gates on critical path. The leakage power reduction results indicated for techniques [10] and [14] have high delay and area penalties whereas those indicated for LECTOR are with zero delay penalty and much lesser area penalty. Normalized area is the ratio of layout areas of LCT-MC and U-MC circuits.

were performed assuming 70-nm fabrication process parameters generated using the BSIM predictive models with temperature set to 25°C.

Leakage power dissipation was measured by exciting both circuits with the same set of randomly generated input vectors. After applying one input vector, the circuit is made to wait long enough before exciting it with the next input vector. This allows the circuit's switching activity to die down, after which the power dissipated is due to leakage currents. We measured the power during this time period and averaged it over all the input vectors to obtain the average leakage power dissipations for each circuit. We also measured the average total power dissipated in each case. To measure the dynamic power dissipation, we have driven both the circuits with same excitation such that the switching activity is very high. The input vectors are fed at a faster rate not allowing the circuit to settle down. Hence, in this case, leakage power dissipation will be minimal.

The experimental results are listed in Table VI. The first column lists the MCNC'91 benchmark circuits. The average leakage power dissipation (in microwatts) for U-MC and

LCT-MC circuits are given in columns two and three, respectively. Columns six and seven provide the average total power dissipation (in microwatts) for the respective circuits. Both the leakage and total power estimates were obtained by averaging the circuit simulations over ten different sets of 500 randomly generated (with a probability of 0.5) input patterns. The input patterns were triggered with long time intervals between them so as to have minimum circuit switching activity. Thus, it was ensured that the major contributor to the power dissipation is leakage power.

The dynamic power estimates (in microwatts) obtained for U-MC and LCT-MC circuits are indicated in columns four and five, respectively. Ten different sets of 1000 randomly (with a probability of 0.5) generated input patterns are used to estimate dynamic power in both cases. The inputs were applied in quick succession so as to maintain high switching activity in the circuit. Hence the major contributor to the power dissipation in this case is dynamic power. The normalized area of LCT-MC circuits with respect to its corresponding U-MC circuits is given in column eight. We created standard cell layouts of all the gates

used in synthesis of both LCT-MC and U-MC circuits. Cadence Silicon Ensemble was used for placement and routing of the netlists using the standard cell layouts. The layout area required by the circuits were measured to calculate the area overheads of the LCT-MC circuits with respect to U-MC circuits.

The results show an average reduction of 79.4% in average leakage power dissipation with an average area overhead of 14%. It should be noted that the leakage savings obtained is without any significant sacrifice in dynamic power and with zero delay penalty. Columns nine and ten of Table VI compares the leakage savings obtained for MCNC'91 benchmarks using the LECTOR and the techniques from [10] and [14]. We have chosen the technique based on transistor stacks [10] as it works on the same basic idea as LECTOR. The dual threshold voltage technique combined with delay balancing and retiming, proposed in [14], is chosen for comparison because of its significant results in dual threshold voltage technique category. For circuits C432–C7552, we compare our results with dual threshold voltage technique [14] and for circuits I1–I10, we compare with transistor stacks [10]. It should be noted that leakage reduction results shown in [10] suffer from delay and area penalties. The leakage control transistors in transistor stacks technique are sized to 30% of total sizing required to achieve zero delay penalty, and hence results in average delay overhead of 16%. The area overhead due to additional transistors and control circuitry is about 46.3% to 50%. Also, the savings indicated for transistor stacks technique are obtained on application of the *low-leakage input vectors* and neglect the dynamic power dissipations due to control circuitry. In general, the dynamic power dissipation due to control circuitry is very significant and may overshadow the savings in leakage power depending on the switching activity of control logic. In the case of Dual  $V_T$  technique [14], the leakage savings given are with respect to benchmark circuits without performing technology mapping. When technology mapping is performed before calculating the leakage reduction, the leakage savings will be significantly less for these circuits than that shown in Table VI. Also, the results given are based on the maximum allowable delays for the gates with the threshold voltage scaled to as high as 0.5  $V_{dd}$ . This would result in delay penalty of anywhere between 5% to 64%, as pointed out in [10]. In spite of these dissimilarities in calculations of leakage power savings, we can observe from Table VI that LECTOR provides significantly better results.

## VII. CONCLUSIONS

The scaling down of device dimensions, supply voltage, and threshold voltage for achieving high performance and low dynamic power dissipation has largely contributed to the increase in leakage power dissipation. With deep-submicron and nanometer technologies, the leakage current becomes more critical in portable systems where battery life is of prime concern. We have presented an efficient design methodology for reducing the leakage power in CMOS circuits. LECTOR yields better leakage reduction as the threshold voltage decreases and hence aids in further reduction of supply voltage and minimization of transistor sizes. Unlike other leakage control techniques, LECTOR does not need any control circuitry to monitor the

states of the circuit. Hence, LECTOR avoids the sacrifice of obtained leakage power reduction in the form of dynamic power consumed by the additional circuitry to control the overall circuit states. Experimental results show that our technique yields average leakage reduction of about 79.4% for the same critical path delay in comparison to the unmodified circuit for MCNC'91 benchmarks with an area overhead of 14%.

## ACKNOWLEDGMENT

The authors wish to thank the anonymous reviewers for all their comments that helped in improving the quality of the manuscript.

## REFERENCES

- [1] H. J. M. Veendrick, "Short circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits," *IEEE J. Solid-State Circuits*, vol. SC-19, pp. 468–473, Aug. 1984.
- [2] A. P. Chandrakasan and R. W. Brodersen, "Minimizing power consumption in digital CMOS circuits," *Proc. IEEE*, vol. 83, pp. 498–523, Apr. 1995.
- [3] R. X. Gu and M. I. Elmasry, "Power dissipation analysis and optimization for deep submicron CMOS digital circuits," *IEEE J. Solid-State Circuits*, vol. 31, pp. 707–713, May 1999.
- [4] Q. Wang and S. Vrudhula, "Static power optimization of deep sub-micron CMOS circuits for dual  $V_T$  technology," in *Proc. ICCAD*, Apr. 1998, pp. 490–496.
- [5] B. J. Sheu, D. L. Scharfetter, P. K. Ko, and M. C. Jeng, "BSIM: Berkeley short-channel IGFET model for MOS transistors," *IEEE J. Solid-State Circuits*, vol. SC-22, pp. 558–566, Aug. 1987.
- [6] S. Thompson, P. Packan, and M. Bohr, "MOS scaling: Transistor challenges for the 21st century," *Intel Technol. J.*, vol. Q3, 1998.
- [7] J. P. Halter and F. Najm, "A gate-level leakage power reduction method for ultralow-power CMOS circuits," in *Proc. IEEE Custom Integrated Circuits Conf.*, 1997, pp. 475–478.
- [8] D. Duarte, Y.-F. Tsai, N. Vijay Krishnan, and M. J. Irwin, "Evaluating run-time techniques for leakage power reduction," in *7th Proc. ASP-DAC*, 2002, pp. 31–38.
- [9] M. D. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar, "Gated-Vdd: A circuit technique to reduce leakage in deep submicron cache memories," in *Proc. IEEE ISLPED*, 2000, pp. 90–95.
- [10] M. C. Johnson, D. Somasekhar, L. Y. Chiu, and K. Roy, "Leakage control with efficient use of transistor stacks in single threshold CMOS," *IEEE Trans. VLSI Syst.*, vol. 10, pp. 1–5, Feb. 2002.
- [11] J. Kao, A. Chandrakasan, and D. Antoniadis, "Transistor sizing issues and tool for multi-threshold CMOS technology," in *Proc. 34th DAC*, 1997, pp. 409–414.
- [12] C. Gopalakrishnan and S. Katkoori, "Resource allocation and binding approach for low leakage power," in *Proc. IEEE Int. Conf. VLSI Design*, Jan. 2003, pp. 297–302.
- [13] L. Wei, Z. Chen, M. Johnson, and K. Roy, "Design and optimization of low voltage high performance dual threshold CMOS circuits," in *Proc. 35th DAC*, 1998, pp. 489–492.
- [14] V. Sundarajan and K. K. Parhi, "Low power synthesis of dual threshold voltage CMOS VLSI circuits," *Proc. IEEE ISLPED*, pp. 139–144, 1999.
- [15] S. Rele, S. Pande, S. Onder, and R. Gupta, "Optimizing static power dissipation by functional units in superscalar processors," in *Proc. Int. Conf. Compiler Construction*, Apr. 2002, pp. 261–275.
- [16] S. Narendra, S. Borkar, V. De, D. Antoniadis, and A. P. Chandrakasan, "Scaling of stack effect and its application for leakage reduction," *Proc. IEEE ISLPED*, pp. 195–200, Aug. 2001.
- [17] S. Sirichotiyakul, T. Edwards, C. Oh, R. Panda, and D. Blaauw, "Duet: An accurate leakage estimation and optimization tool for dual- $V_T$  circuits," *IEEE Trans. VLSI Syst.*, vol. 10, pp. 79–90, Apr. 2002.
- [18] N. H. E. Weste and K. Eshraghian, Eds., *Principles of CMOS VLSI Design: A Systems Perspective*, 2 ed. Reading, MA: Addison-Wesley, 1993.
- [19] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton, and A.-S. Vincentelli, Eds., *SIS: A System for Sequential Circuit Synthesis*. Berkeley, CA: Univ. of California Press, 1992.
- [20] A. I. Reis, "Covering strategies for library free technology mapping," in *Proc. XII Integrated Circuits and Systems Design*, 1999, pp. 180–183.



**Narender Hanchate** (S'01) received the B.E. degree in electronics and communication engineering from Osmania University, Hyderabad, India, in 2000 and the M.S. degree in computer science and engineering from the University of South Florida, Tampa, in 2003, where he is currently working toward the Ph.D. degree in computer science and engineering.

From 2000 to 2001, he was a Design Engineer in the field of VLSI and Embedded systems with HCL Technologies Ltd., Chennai, India. His research interests include the development of methodologies for low-power synthesis, interconnect optimization, and crosstalk analysis of deep-submicron circuits.

Mr. Hanchate is a student member of the IEEE Computer Society.



**Nagarajan Ranganathan** (M'88–SM'92–F'02) received the B.E. (Honors) degree in electrical and electronics engineering from Regional Engineering College, Tiruchirapalli, University of Madras, India, in 1983 and the Ph.D. degree in computer science from the University of Central Florida, Orlando, in 1988.

He is currently a Professor in the Department of Computer Science and Engineering and the Nanomaterials and Nanomanufacturing Research Center at the University of South Florida, Tampa. During 1998–1999, he was a Professor at the University of Texas at El Paso. His research interests include VLSI design, design automation, computer architecture, and parallel processing. He has developed many special-purpose VLSI chips for computer vision, image processing, pattern recognition, data compression, and signal processing applications. He has published over 175 papers in reputed journals and conferences and is a co-owner of five U.S. patents.

Dr. Ranganathan is a member of the IEEE Computer Society and IEEE Circuits and Systems Society. He has served on the editorial boards for *Pattern Recognition* (1993–1997), *VLSI Design* (1994–present), IEEE TRANSACTIONS ON VLSI SYSTEMS (1995–1997), IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS (1997–1999), and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (1997–2000). He was the Chair of the IEEE Computer Society Technical Committee on VLSI during 1997–2001. He served as the Steering Committee Chair of the IEEE TRANSACTIONS ON VLSI SYSTEMS during 2001–2002 and is serving as the Editor-in-Chief for 2003–2004. He was elected as a Fellow of IEEE in 2002 for his contributions to algorithms and architectures for VLSI systems. Recently, he received the Theodore and Venette Askounes-Ashford Distinguished Scholar Award from the University of South Florida, Tampa.