

# **Design and Implementation of Low Power Synchronous 3-Bit Counter for Energy Optimization**

**A Project Report of  
ECE 802 Low Power VLSI Design**

**By  
Anush (201EC108)  
Aniket Uppin (201EC169)**

**Under the guidance of  
Dr.KALPANA G.BHAT**



**NATIONAL INSTITUTE OF TECHNOLOGY  
KARNATAKA, SURATHKAL**

# **Design and Implementation of Low-Power CMOS Synchronous Counter for Energy Optimization.**

## **Introduction**

The design of energy-efficient digital circuits is becoming increasingly important in modern electronic systems. With the growing demand for portable and wearable devices, reducing power consumption in digital circuits has become critical to extending battery life and improving overall device performance.

Counters are fundamental building blocks in digital circuits, used for counting events and generating timing signals. Synchronous counters are particularly important, as they are used for various applications such as frequency division, data acquisition, and digital signal processing. However, synchronous counters based on flip-flops can consume considerable power due to their continuous clocking operation.

One common approach to achieving low power consumption in digital circuits is using sleep transistors, which help reduce leakage currents when a part of the circuit is idle. This helps reduce the leakage power, contributing significantly to the total power consumption as we decrease the feature size. Another technique is clock gating, which selectively enables the clock signal to only the necessary parts of the circuit. This helps to reduce the switching activity, thereby reducing the dynamic power consumption.

In addition to sleep techniques, flip-flops significantly contribute to power consumption in Counters. To mitigate this, clocking gating can reduce flip-flop dynamic power consumption. Clock gating selectively disables the clock signal to unused flip-flops, bringing down the flip-flop's switching activity, thereby reducing overall power consumption without affecting the circuit's functionality. D flip-flops designed with dynamic logic can be used in counter design to reduce power consumption. Compared to static CMOS D flip-flops, dynamic D flip-flops use fewer transistors and operate at a higher frequency, resulting in low power consumption. Using dynamic D flip-flops in counter design makes it possible to design more power-efficient counters.

## Objective:

This project aims to develop and optimize a low-power synchronous binary 3-bit counter for energy-efficient applications. The design will focus on implementing sleep techniques, such as introducing sleep transistors and efficient clock gating for power reduction at the gate level. Additionally, clock gating will be used to optimize the power consumption of flip-flops during the memory stage. The proposed design is expected to achieve significant dynamic and static power reduction through these techniques while maintaining the desired functionality and performance of the synchronous up counter.

We look forward to designing a counter that can easily be scaled to multi-bits.

## Design:

The 3-bit counter consists of three D flip-flops and logic gates to implement the counter functionality. The logic gates are implemented using CMOS technology, which provides low power consumption and high noise immunity. Each D flip-flop is created as a CMOS logic device utilizing NAND gates and sleep transistors in a master-slave configuration [1]. Clock gating logic is designed so that whenever there is a data stream at ‘D’ input, the switching of the gated clock occurs depending on the status of the present and previous data.



Figure 1: D-flip flop circuit diagram

## Power optimization:

Sleep transistor/power gating puts the counter in sleep mode when not in use to optimize power consumption. The sleep transistors are connected in series with the power supply of the counter, and a sleep signal controls it. We have used both a header as well as a footer sleep transistor as in [1]. When the sleep signal is high, the sleep transistor turns on, and the counter is in sleep mode. In this mode, the leakage power is nearly brought down to 0 because there is no path to the ground since the sleep transistors are in the cutoff region.



Figure 2:Sleep transistors logic

In standby mode, the clock gating is used to the D flip-flops to reduce power consumption. Clock gating is achieved by turning off the clock to the D-flip flops when not required, that is, in the memory state of the flip flop and the flip flops are active only when the input signal is different from the output value as in [2]. The clock gating is done to all the flip-flops except the first one since it will toggle in every clock cycle. An XOR operation is performed on the input to the flip-flop and the previous state of the flip-flop, which helps identify the memory state of the flip-flop, indicating we can turn off the flip-flop during this period. The result acts as an enable signal to the clock.



Figure 3: Elementary clock gating logic



Figure 4: Proposed clock gating logic



Figure 5: D Flip-flop Implementation using Dynamic Logic [4]

### Simulation:

The design is simulated using NGspice, an open-source circuit simulator. The simulation results show that the counter operates correctly, and the power consumption is significantly reduced in sleep and standby modes.



Figure 6: This figure shows gated clocks for every flip flop along with the states of the 3-bit negative edge triggered counter.



Figure 7: This shows a full view of the 3-bit counter. Each state is displayed with the system clock, showcasing the correct counting sequence from 0 to 7.



Figure 8: This figure shows the states of the counter when the counter is in active mode.



Figure 9: This figure shows the state of the counter initially in sleep mode and a transition to active mode.



Figure10:This figure shows the magic layout of the d-flip flop.



Figure 11: This figure shows the d-flip flop logic output from the IRSIM analyzer.



Figure 12: This figure shows the magic layout of the 2-bit counter along with the clock gating logic.



Fig 13:This figure shows a positive edge-triggered counter using d-flip flops made with dynamic logic.



Figure 14: In this figure, we can observe that the switching of the states is sharp and clean with respect to the clock edge as compared to the static cmos logic in Figure7.

## Results:

Power and Delay analysis on two bit-counter

|              | Normal Mode | Clock gated | With Sleep and clock gated |
|--------------|-------------|-------------|----------------------------|
| Delay (in s) | 1.23e-10    | 1.39e-10    | 1.39e-10                   |
| Power (in W) | 1.67e-05    | 1.523e-05   | 1.054e-05                  |

Power and Delay Analysis on Three bit-counter

|              | Normal Mode | Clock gated | With Sleep and clock gated |
|--------------|-------------|-------------|----------------------------|
| Delay (in s) | 1.24e-10    | 1.438e-10   | 1.51e-10                   |
| Power (in W) | 2.405e-05   | 1.935e-05   | 1.61e-05                   |

Power consumption plot



Delay plot



The power consumption was measured for the counter under different conditions. Without any low-power design techniques, the counter consumed a total power of 24.05 uW. With clock gating, the power consumption is reduced by 19.54% to 19.35uW. With clock gating and sleep transistors, the power consumption was reduced by 33.12% to 16.1uW. The delay increased by only 15.96% compared to the counter without low-power design techniques, which is manageable.

The flip-flop designed with dynamic logic promised a good result. When we used the dynamic flip-flops incorporated into the clock-gated counter, it showed remarkable results. The power consumption, in this case, is 6.828uW, which is 71% less compared to a flip-flop designed with static logic without any power optimization technique. The delay is reduced to 94.68ps. However, we haven't put these results above in the comparison because the above table consists of results obtained from a static CMOS design.

Improvement for the 3-bit counter design is to scale up the number of bits to create a multi-bit counter. This can be achieved by replicating the 3-bit counter design and cascading the outputs of each counter to create a higher-bit counter.

## **Conclusion:**

In conclusion, we have designed a 3-bit counter with power optimization using sleep transistor/power gating for standby mode and clock gating to the D flip-flop for the counter. The counter is designed from scratch using TSMC 65nm CMOS technology file, and each subcircuit, logic gates, and D flip-flop are created from scratch in NGspice, and sleep transistors have been incorporated in them. The simulation results show that the counter operates correctly, and the power consumption is significantly reduced using the proposed power optimization techniques.

Furthermore, the implementation of flip-flops using dynamic logic instead of static CMOS logic showed remarkable improvement in power reduction. It is worth noting that the design of these flip-flops does not follow the master-slave configuration commonly used in static CMOS designs.

Our results show that low-power design techniques can significantly reduce the power consumption of digital circuits without introducing significant delays. In addition, dynamic logic can be used to reduce power consumption in flip-flops.

Overall, our project demonstrates the effectiveness of low-power design techniques and dynamic logic in reducing power consumption in digital circuits. These techniques can be applied in various applications where power consumption is critical, such as mobile devices and IoT devices.

### **Individual Contributions:**

#### **Aniket Uppin:**

Literature Survey and Detailed Study of research papers.

Designed layout of counter on Magic.

Deciding the design specifications, rules, and constraints.

Netlist implementation of static and dynamic circuits in NGspice.

Power calculation and analysis.

Documentation.

Hours of Work:24

### **References and Citations:**

[1].Himal Pokhrel, Deepak Kumar, and Anjali Sharma, "Design and Analysis of 4-Bit Binary Synchronous Counter by Leakage Reduction Techniques", *International Journal of Computer Trends and Technology (IJCTT)* vol. 50, no. 2, August 2017

[2].Pritam Bhattacharjee, Alak Majumder, and Tushar Dhabal Das, "A 90 nm Leakage Control Transistor Based Clock Gating for Low Power Flip Flop Applications", *IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS)*, 16-19 October 2016, Abu Dhabi, UAE.

[3].Gianluca Giustolisi, Rosario Mita, Gaetano Palumbo, and Giuseppe Scotti, "A Novel Clock Gating Approach for the Design of Low-Power Linear Feedback Shift Registers.", *IEEE Access*

[4]. Flip-flop (electronics). (2023, March 31). In *Wikipedia*.[https://en.wikipedia.org/wiki/Flip-flop\\_\(electronics\)](https://en.wikipedia.org/wiki/Flip-flop_(electronics))

# Design and Analysis of 4- Bit Binary Synchronous Counter by Leakage Reduction Techniques

Himal Pokhrel<sup>#1</sup>, Deepak Kumar<sup>\*2</sup>, Anjali Sharma<sup>#3</sup>

<sup>#1,\*2</sup> MSC (Physics) Student, Department of Applied Sciences, Shimla University, H.P., India  
<sup>#3</sup> Assistant Professor, Department of ECE, Shimla University, H.P., India

**Abstract:** Counter is one of the fundamental and essential components used in most of the digital devices. Design of power efficient counter design has become essential for the researchers. In leakage dominant technologies, leakage current increases for traditional CMOS structures due to the reduction in threshold voltage. The increase in leakage current due to voltage scaling causes increase in static power dissipation. Various techniques have been implemented by the researchers to design counters which would consume the lowest power possible. In this paper, we have presented a design of 4 - bit binary synchronous counter using three different techniques namely CMOS technique, Sleepy transistor technique (STT) and Forced stack technique (FST). The circuit designing and parametric analysis has been carried out using microwind 3.1 and DSCH 3.1 software on 65nm technology. The height, width, surface area and power consumption in the case of all the three techniques have been measured at three different supply voltages i.e. 0.5V, 0.7V and 0.9V respectively. It is found that, the power consumed by FST counter and SST counter is much less as compared to power consumed by CMOS counter. The average power reduction is 44.9% in the case of sleepy transistor technique and the average power reduction is 70.1% in the case of FST as compared to CMOS counter. Although these techniques are power efficient as compared to CMOS technique but this is on the expense of larger surface area. Counter designed by these techniques can be useful where low power requirement will be primary concern.

**Keywords** — CMOS, DSCH, FST, Leakage Power, Microwind, Synchronous Counter

## I. INTRODUCTION

A digital counter circuit consist of two or more flip flops with combinational elements. The flip flop change state in a prescribed sequence. It is used to counts the number of occurrence of an event or input signals. Counter has wide range of application in several circuits like signal generators, microcontroller, digital memories digital clock and timing circuit [1]. High performance computers with low power consumption are in need for all

designers. In CMOS technology high density and high performance can be achieved by scaling down features size and threshold voltage. Due to decrease in features size, channel length becomes shorter and this causes increase in sub threshold leakage current through a transistor during its off mode. So, the static power consumption increases. This is also called leakage power dissipation which has been increasing significantly compared to the total power consumption [2].

For CMOS circuit the total power dissipation includes dynamic and static components during the active mode of operation. The dynamic (switch) power and leakage power are given as:

$$P_D = \alpha f_c V_{DD}^2 \quad (1)$$

Where  $\alpha$  = Switching activity,  $f$  = Operation frequency,  $C$  = Load capacitance,  $V_{DD}$  = Supply voltage

Leakage power is given by

$$P_{Leakage} = I_{Leakage} V_{DD} \quad (2)$$

In CMOS inverter, current flows from source to drain when  $V_{GS} > V_T$ . In real transistors current does not abruptly cut off - below threshold, but drops off exponentially. This condition is known as leakage conduction and result in undesired condition when transistors are normally off. The leakage current is the sub threshold or weak inversion current that flows from drain to source of transistor when it is off (i.e.  $V_G < V_{tb}$ ). Leakage current drawn from the circuit when threshold voltage becomes greater than the gate to source voltage, is given by equation (3). So current can be decreased by increasing the threshold voltage and substrate voltage, reducing the gate source voltage, drain source voltage and temperature.

$$I_{ds} = I_{ds0} e^{\frac{V_{gs}-V_t}{nV_T}} \left\{ 1 - e^{\frac{-V_{ds}}{V_T}} \right\} \quad (3)$$

Where  $I_{ds0}$  = Current at threshold,  $V_t$  =Zero bias threshold swing coefficient,  $n$  =Sub threshold swing coefficient,  $V_T$  = Thermal Voltage.

In this paper, we have implemented the most important techniques namely CMOS technique,

Sleepy technique and Forced Stack technique in order to design 4-bit Synchronous Counter and studied its power consumption for various supply voltage. The main purpose of doing this is to compare the power consumption in different techniques and analysis the cause why a particular technique consume less power. This will ultimately motivate the researcher to designs for new lesser power consuming circuit especially the circuit that reduces leakage power or the static power consumption [3-4].

## II. REVIEW OF PREVIOUS RESEARCHES

This section briefly discusses the review of some previous counter circuits designed by different techniques on different technologies. Shilpa Shrigiriet et. al. designed a 16-bit binary counter implemented with clock gating at 4-bit level. They compared power consumption between the counter implemented with clock gating and the one with normal implementation. It was found that there was a reduction of power by 61.12%. For a normal counter they connected four-four bit counter blocks are stitched together using the clock gating logic. The clock gating logic for the any stage fed through AND function signal from previous stage. The NXT-AND acts as enable signal for normal counter and same acts as gating signal for clock gating counter [5]. Praween Sinha et. al. designed a four bit binary counter using enhancement type MOSFET. They designed T-flip flop using an X-OR gate and D-flip flop were connected with the help of AND gate to make a 4-bit binary counter. This counter worked at voltage of 3V. It has maximum clock frequency of 476 MHz, maximum rise and fall time was 2.1ns and 2ns. It showed reduced power consumption and better output signal level [6].

Calvignac Yvan et. al. have designed and simulated a 4-stage binary counter by CMOS techniques on 65 nm technology using Microwind and DSCH 3.1 Software. They cascade four D register to make 4-stage counter. Each D register was formed by connecting two D flip flop and a NOT gate [7]. Sandeep Thakur et. al. designed and simulated 4-bit ring counter using 45 nm technology. They designed a master slave D flip flop using NAND gate and an inverter and four D flip flop were cascade to forming counter. Flip flop were provided with same clock pulse to because of synchronous nature of ring counter. The designed the counter using cadence EDA tool. They simulated the designed ring counter for 200ns and analysed using 45 nm technologies at 1 voltage rating in cadence tool. The delay & power dissipation of the proposed design was compared with that of conventional design. The transistor count, delay and power dissipation were found to be respectively 106, 120.125ns and 313.43pw for

conventional ring counter and 58, 5.216ns and 219.85 pw for proposed ring counter [8].

## III. THE PROPOSED COUNTER

We have designed 4-bit synchronous counter by using three different techniques i.e. CMOS Techniques, Sleepy Transistor technique and Forced Stack technique. A 4-bit counter consists of four D-registers, connected in series along with four X-OR gates and three AND gates as shown in Fig. 1. The clock pulse is provided to each D-register simultaneously. Output of each AND gate act as an input to next cascade AND gate and output of XOR gate act as an input to D-register. Output of 4-bit counter has been displayed on hexadecimal display.



Fig 1: 4-bit counter Design

### A. Layout Design of X-OR gate

We designed X-OR gate using CMOS. In all three methods the basic building blocks are NMOS and PMOS. The difference is in their arrangement and number. The circuit diagram and layout design of X-OR using CMOS technique is shown in Fig. 2 and 3 respectively.



Fig 2: CMOS XOR Gate



**Fig 3:** Layout of XOR Gate

#### B. Simulation for X-OR gate

We simulated XOR gate using Microwind 3.1 on 65nm technology. Foundry voltage at 65nm is 0.7V. XOR gate designed by CMOS shows full voltage swing at the output. Parametric analysis has been carried out at 5nm scale. Post layout simulations of XOR gate has been shown in Fig. 4.



**Fig 4:** Post layout simulations of XOR gate

Circuit diagram and Layout of XOR gate using Sleep Transistor Technique (STT) is shown in Fig. 5 and 6 respectively. This circuit has been designed by using 8 PMOS and 8 NMOS transistors.



**Fig 5:** XOR gate using Sleep Transistor Technique



**Fig 6:** Layout of XOR gate by STT

Circuit structure of STT is shown in Fig.7 which consist CMOS implementation for actual logic implementation and two high threshold transistors as sleep transistors.



**Fig 7:** Circuit Structure of STT

#### C. D-flip flop

D flip flop, also called Data flip flop or Delay flip flop is constructed from a gated SR-flip flop with an inverter added between S & R inputs to allow for a single D (Data) input. This single data input is used to replace “SET” signal and the inverter is used to generate the complimentary “RESET” input there by making a level sensitive D type flip flop. The block diagram of D flip flop using NAND and NOR gate is shown in Fig. 8.



**Fig 8:** D- FF using NAND Gate

We designed D flip flop has been designed by using all three mentioned techniques. D flip flop's module was formed and simulated so that it can be used for designing counter circuit. The circuit design of CMOS D flip flop is shown in Fig. 9.



**Fig 9:** Circuit design of CMOS D flip flop

After studying the design and implementation of D flip flop, we extracted similar module of D register from the symbol box in the Microwind3.1 and constructed a 4-bit synchronous counter.

#### IV.4-BIT SYNCHRONOUS COUNTER: DESIGN AND WORKING

The basic block diagram of 4-bit synchronous counter is shown in Fig 10. The counter module consist of 4 XOR gates, 3 AND gates and 4 D FFs. Output of each XOR gate act as an input to the cascaded D FF. Output of the module can be observed by hexadecimal display.



**Fig 10:** Basic block diagram of 4-bit synchronous counter

The circuit diagram of 4-bit synchronous counter by CMOS technique is shown in Fig. 11 and corresponding layout is shown in Fig. 12. All modules are connected in the same way as shown in Fig. 10. To reduce the circuit complexity XOR gate and D FF module has been used which is designed by the CMOS technique only.



**Fig 11:** 4-bit synchronous counter by CMOS

The layout design and post layout simulations for CMOS counter are shown in Fig. 12 and Fig. 13.



**Fig 12:** Layout of 4-bit synchronous counter by CMOS



**Fig 13:** Post Layout simulations of 4-bit synchronous counter by CMOS

Similarly the circuit diagram of 4-bit synchronous counter by STT and FST are shown in Fig. 14 and 15 respectively.



Fig 14: C by STT



Fig 15: 4-bit synchronous counter by FST

## V. RESULT AND DISCUSSION

We designed 4-bit synchronous counter using CMOS technique, Sleepy Transistor Technique and Forced Stack Technique. The layout design and simulation result of all three counters were obtained. We measured number of PMOS, number of NMOS, Height, Width and Surface Area of XOR gate and synchronous counter in all three cases.

TABLE I: Layout result of X-OR Gate

| Technique | PMOS | NMOS | Height (μm) | Width (μm) | Area (μm) <sup>2</sup> |
|-----------|------|------|-------------|------------|------------------------|
| CMOS      | 6    | 6    | 6.0         | 14.5       | 87.2                   |
| STT       | 8    | 8    | 8.3         | 23.7       | 197.7                  |
| FST       | 12   | 12   | 7.2         | 28.3       | 203.4                  |

TABLE III: LAYOUT RESULT OF 4-BIT SYNCHRONOUS COUNTER

| Technique | PMOS | NMOS | Height (μm) | Width (μm) | Area (μm) <sup>2</sup> |
|-----------|------|------|-------------|------------|------------------------|
| CMOS      | 69   | 77   | 19.9        | 70.7       | 1405                   |
| STT       | 107  | 115  | 39.9        | 71.1       | 2836.3                 |
| FST       | 102  | 110  | 314         | 74.5       | 2304                   |

The Table-I gives height, width and surface area along with number of PMOS and NMOS for X-OR gate designed by three different techniques and the second Table-II shows the relative number of PMOS and NMOS, Height, Width and Surface area of the designed counter for different techniques.

The ratio of width and length for PMOS and NMOS are taken 6 for PMOS and 3 for NMOS in case of CMOS technique and STT, whereas for circuit design by FST the ratios are 3 and 1.5 for PMOS and NMOS respectively. We keep length constant i.e. equal to 0.07μm and choose the appropriate value of width for different techniques. The value of width is 0.42μm and 0.21μm for PMOS and NMOS in CMOS technique, 0.21μm and 0.105μm in FST respectively. The added Sleep and Sleep bar NMOS and PMOS in STT have width and length just double of that in normal CMOS.

From the comparison tables it can be observed that the number of transistors as well as surface area increase as we go from CMOS technique to FST. Hence FST requires maximum number of transistors and maximum surface area among the three mentioned techniques. The STT requires number of transistors and surface area intermediate of the rest two techniques. It can be concluded that CMOS technique requires least number of transistors and consumes least area. However our aim is to design and analyse counter that consumes least power irrespective of how much area does it cover. Next, we measured power consumed by X-OR gate and 4-bit synchronous counter for different supply voltage in each technique.

TABLE IIII: TOTAL POWER DISSIPATION BY THE BENCHMARK CIRCUITS ON 65NM TECHNOLOGY

| Power Reduction Techniques | Benchmark Circuits |        |        |             |        |      |
|----------------------------|--------------------|--------|--------|-------------|--------|------|
|                            | XOR Gate           |        |        | 1-Bit Adder |        |      |
|                            | 0.5 V              | 0.7V   | 0.9V   | 0.5 V       | 0.7V   | 0.9V |
| CCT                        | 8.172              | 18.274 | 31.757 | 67.419      | 228    | 538  |
| SST                        | 0.234              | 0.465  | 0.861  | 37.89       | 125    | 291  |
| FST                        | 3.278              | 10.977 | 21.591 | 21.968      | 66.537 | 150  |

From the Table-III it can be observed that the power consumed by X-OR gate using CMOS technique is maximum and that by STT is minimum. We measured the power consumption on three different supply voltages i.e. 0.5 V, 0.7V and 0.9V.

**TABLE IVV: POWER DIFFERENCE FOR X-OR GATE**

| Voltage (V) | % Difference in power b/w CMOS & STT | % Difference in power b/w CMOS & FST | % Difference in power b/w FST & STT |
|-------------|--------------------------------------|--------------------------------------|-------------------------------------|
| 0.5         | 97.136                               | 59.887                               | 92.861                              |
| 0.7         | 97.451                               | 39.842                               | 95.763                              |
| 0.9         | 97.288                               | 32.011                               | 96.012                              |
| Average     | 97.291                               | 43.91                                | 94.878                              |

**TABLE V: POWER DIFFERENCE FOR 4-BIT SYNCHRONOUS COUNTER**

| Voltage(V) | % Difference in power b/w CMOS & STT | % Difference in power b/w CMOS & FST | % Difference in power b/w FST & STT |
|------------|--------------------------------------|--------------------------------------|-------------------------------------|
| 0.5        | 43.79                                | 67.41                                | 42.02                               |
| 0.7        | 45.175                               | 70.81                                | 46.77                               |
| 0.9        | 45.91                                | 72.118                               | 48.453                              |
| Average    | 44.958                               | 70.112                               | 45.747                              |

The percentage difference in the power consumed is calculated using the formula

$$\% \text{Difference in power} = \frac{\text{Difference in Power}}{\text{Total Power}} \times 100$$

We calculated the percentage difference in power consumed by X-OR gate and that by counter using the above mentioned formula. The percentage differences in power consumed between every two techniques have been tabulated in Table-IV and Table-V.

## VI. CONCLUSION

As counter is one of basic digital circuit of digital devices so power efficient counter design has become essential for the researchers. In this paper, we have designed a 4 - bit binary synchronous counter using CCT, STT and FST. The circuit designing and parametric analysis has been carried out using Microwind 3.1 and DSCH 3.1 designing tools. Various parameters have been observed on 65nm technology. From observations it can be concluded that the power consumed by FST counter and SST counter is much less as compared to power consumed by CMOS counter. The average power reduction was 44.9% in the case of SST and the average power reduction was 70.1% in the case of FST as compared to CMOS counter design. Although these techniques are power efficient as

compared to CMOS technique but this is on the expense of larger surface area. Counter designed by these techniques can be useful where low power requirement will be primary concern.

## REFERENCES

- Yogita Hiremath, Akalpita L. Kulkurani, J.S. Baligar “Design and implementation of synchronous 4-bit up counter using 180nm CMOS technology,” International Journal of Research in Engineering and Technology, Vol. 3, No.3, 2014.
- Vinay Kumar Madasu, B Kedharnath “Leakage power reduction by using Sleep methods,” International journal of engineering and computer science, Vol. 2, No.9, 2013.
- Rajani H.P and Shrimannarayan Kulkarni “Novel Sleep transistor technique low leakage power peripheral circuits,” International journal of VLSI design and communication system, Vol.3, No.4, 2012.
- Kaushik Roy, Saibal Mukhopadhyay, Hamid Mamoodi-Meimand, “Leakage current mechanism and leakage reduction techniques in deep sub micrometer CMOS circuits,” Proceedings of the IEEE, Vol.91, 2003.
- Shilpa Shrigiri, Yogina Bellad “Low power VLSI design approach for 16 bit binary counter to reduce power,” International journal of current engineering and technology.
- Praween Sinha, Shreyaansh Shrivastava “ Design of a low power 4 bit binary counter using Enhancement type MOSFET, International journal of research and technology Vol.1, No. 8, 2012.
- Calvignac Yvan , Cambonie Pierre “ 4-state binary counter; www.microwind.net
- Sandeep Thakur, Rajesh Mehra, “Optimized Design and Simulation of Ring counter using 45nm technology,” International journal of engineering trends and technology Vol. 36, 2016.

# A 90 nm Leakage Control Transistor Based Clock Gating for Low Power Flip Flop Applications

Pritam Bhattacharjee<sup>1</sup>, Alak Majumder<sup>1</sup>, Tushar Dhabal Das<sup>2</sup>

<sup>1</sup>Department of Electronics & Communication Engineering

<sup>2</sup>Department of Electronic Science

National Institute of Technology, Arunachal Pradesh

Yupia, India - 791112

*pritam\_bhattacharjee@live.com, majumder.alak@gmail.com, tddas@hotmail.com*

**Abstract**— The continuous growing demand of portable battery-powered electronics devices hunts for Nano-electronic circuit design for ultra-low power applications by reducing dynamic power, static power and short circuit power. In sequential circuit elements of an IC, a notable amount of power dissipation occurs due to the rapid switching of high frequency clock signals, which do not fetch any data bit or information. The needless switching of clock, during the HOLD phase of either ‘logic 1’ or ‘logic 0’, may be abolished using gated clock. In this paper, we have presented a new clock gating technique incorporating Leakage Control Transistor. The improvised technique is employed to trigger a D-Flip Flop using 90nm PTM technology at 1.1V power supply. We have observed an impressive reduction in power, delay and latency using the proposed gating logic, which has outsmarted the existing works. The simulation is also performed in smaller technology nodes such as 65nm, 45nm and 32 nm to notice the change in delay, dynamic power and static power of the circuit.

**Keywords**—clock gating techniques; LECTOR; D Flip Flop; D Latch; Static & Dynamic Power.

## I. INTRODUCTION

With the advancement of technology, MOS transistors are scaled down extensively. Thereby, large no. of transistors are being integrated in single chip which leads to large power dissipation. This has become a major threat in recent days along with area, performance, reliability and speed. The increase in frequency of operation and low power design trend have guided the necessity of exploring intuitive power reduction strategies. It is known that, in sub-nanometer regime the integral part of power consumption is the static power due to runtime leakage current from  $V_{DD}$  to ground and it must be blocked. Although voltage scaling serves the purpose, it limits the operation speed while scaled down to the threshold voltage ( $V_T$ ) of the circuit. An attempt to scale down  $V_T$ , may address the above issue; but it initiates an exponential increase of sub-threshold leakage. To block the leakage, a technique has been reported in 2004 based on the effective stacking of PMOS and NMOS transistor between  $V_{DD}$  and ground and is called Leakage Control Transistor (LECTOR) [1]. This technique has the prospective of leakage reduction through the power line providing a little penalty on delay. Again, the integral part of the dynamic power dissipation is switching power which is mainly due to the global clock used in the design. To eliminate

this switching power up to certain extent, the clock signal is deactivated for a definite amount of time, when not needed. Hence, a part of dynamic power can be saved by reducing the switching activity factor. This technique is known as Clock Gating [2, 3]. The principle of clock gating is shown in fig. 1.



Fig. 1. Principle of Clock Gating.

The system clock is gated using a gating logic to suppress its needless switching in the idle time. Clock gating logic (CGL) is steered by a control signal. There are various architectures prevalently in clock gating with variety usage of the control signal. In this paper, we have come up to curb both static and dynamic power dissipation simultaneously by using LECTOR in the clock gating logic. The proposed clock gating architecture is investigated on a D-FF using 90nm technology with a supply of 1.1Volt.

The paper is organized as follows: The pros and cons of previous works on clock gating are explored briefly in Section II. Section III deals with the working principle of our proposed gating technique. In section IV, the proposed gating technique is tested on a D-FF. The parametric analysis of the result is discussed in section V. Finally, section VI concludes the work.

## II. LITERATURE SURVEY

The various design styles of clock gating are broadly categorized as [7]:

### A. Latch-free based design

It is one of the simple clock gating logic using OR/AND logic depending on the type of triggering of the sequential block. This design is explored in [8, 10 and 12], where the system clock is the input to clock gating logic and an enable signal behaves as the control signal. If it is an OR logic, the system clock will reflect to the gated clock with enable is zero. Otherwise the gated clock line will remain high. During sleep

period, there is no harm if the enable changes. But it can lead to corrupted gated clock in the active period. As the enable is prone to signal noise, this design is not preferable at clock gated applications.

### B. Latch-based design

It is implemented to solve the controllability problem of the enable in latch-free based design. Here, the enable is controlled by using a latch as shown in fig. 2.



Fig. 2. Latch-based gated clock design.

If it is a positive level latch, the enable is latched during the positive half cycle of the system clock. This means, the enable is sensed only in the active period when the gated clock receives the system clock. The sleep period of system clock cannot be captured by gated clock, which reflects design flaw.

### C. Flip-Flop based design

The enable is controlled using a flip-flop in such design styles. During the edges of the system clock the enable gets reflected at flip-flop output to attain the gated clock. That's why the sleep period in flip flop based design is quite longer than that of the latch-based design.

To solve these predominant design errors, a modification has been reported in [4, 5, 6 and 11]. The modified approach is inspired by AND based clock gating and uses a positive level latch as shown in fig. 4. The gated clock is obtained from the AND gate output and is fed into a latch. The inputs to the AND gate are the system clock and the output of the XOR gate, which compares the latch input (D) with its output (Q). It is basically done to activate the AND based clock gating logic only when the data to the latch changes. The gated clock will remain low till there is no change in data. This will lower the switching activity of the gated clock and saves a part of dynamic power. For negative level latch, the comparator and gated clock logic are implemented using XNOR and OR gate respectively. But, a large leakage flows from  $V_{DD}$  to ground of a design during transition at the data input of the latch. It leads to a huge static power dissipation, which is to be prevented.

## III. PROPOSED CLOCK GATING

It is mentioned in the previous section that the clock gating style reported in [4] and [5] is unable to stop the contention current during the transition of data line. Therefore, a blocking element is needed between the  $V_{DD}$  and ground to stop the leakage flow. It is possible by placing a pack of PMOS and NMOS transistors in between the pull-up and pull-down stack. This is known as LECTOR [9, 13]. It is based on the fact that two or more OFF transistors in the path of  $V_{DD}$  to ground is

less prone to current leakage than a single OFF transistor in that path [1]. The LECTORS have their gate terminal connected to each other's source terminal as shown in fig. 3. During its operation one of the transistors is always near its cut-off region, thereby blocking the leakage current. It will stop the contention current flow even if the data switches frequently.



Fig. 3. LECTOR Model.



Fig. 4. LECTOR based gated clock drives a positive-triggered Latch.

In our proposed model of clock gating, LECTOR is the major constituent added. As shown in fig. 4, the latch is positive-level sensitive i.e., the latch is transparent only when the gated clock ( $ckg$ ) = 1 and is in hold state when  $ckg$  = 0. Therefore, when there is data on 'D', the response of the latch is awaited on 'ckg'. The 'ckg' is dependent on the XOR output and the system clock (ck). If 'D' is indifferent of Q, then the 'ckg' will not follow 'ck' for whatever value it has. The 'ckg' will follow 'ck' only when 'D' and 'Q' are different. So, whenever there is a data stream at 'D' input, the switching of gated clock occurs depending on the status of the present and previous data. As mentioned in [11], the input 'D' is fed to 'Q' to obstruct the unnecessary switching of the gated clock, which can be applied to specific sequential behaviours like register and counter. The architecture in fig. 4 can be applied to any sequential logic. However, even though LECTOR offers a bit more delay than CMOS, it will not slower down the speed the gated sequential circuit, as gating logic is never cascaded, rather is done in individual flip-flops.

The novelty of our work is that we are able to block the leakage current in the clock gating logic circuit using LECTOR during the switching of the data input. Our clock gating logic is designed using an AND gate for the operation of positive-triggered sequential component. Therefore, in this paper, we have presented a LECTOR based AND gate by cascading the LECTOR-NAND gate with a LECTOR-Inverter using 90nm PTM technology [15] and a supply

voltage of 1.1V. The operating frequency of this AND gate is 1.25GHz. In fig. 5 and fig. 6, we have shown the schematic and transient response of the LECTOR-AND gate respectively.



Fig. 5. Schematic of LECTOR – based AND gate.



Fig. 6. Transient Analysis of LECTOR – based AND gate.

The AND gate output ‘C’ provides a stable full rail swing in response to the inputs ‘A’ and ‘B’. The average propagation delay reads as 76.97ps. This circuit offers an average current flow of  $4.203\mu\text{A}$  and average power dissipation of  $4.62\mu\text{W}$ . With a process variation of 10% at  $27^\circ\text{C}$ , we have observed 1.34% and 1.78% change in the rising and falling edge of the output ‘C’ respectively. So, it can be claimed that the proposed clock gating logic will offer good logic performance.

#### IV. D FLIP FLOP USING PROPOSED TECHNIQUE

We have employed the proposed clock gating logic to operate a D Flip-flop. D-FF is made using a master–slave latch configuration with cross-coupled inverters and transmission gates as shown in fig. 7. Both the master and slave latch are synchronised using the gated clock, where the gated clock orientation in master and slave latch is different. Therefore, either master or the slave latch remains active at a certain period. As the flip-flop is triggered by the gated clock,

unnecessary switching activity is avoided when data ‘D’ does not change.

When ‘T1’ and ‘T4’ are active (if  $\text{ckg}=1$ ), it lets the current data to pass through master latch and stores the previous data into slave latch. If the current data is different from the previous data as compared by XOR gate, the  $\text{ckg}$  changes, otherwise it retains the previous state.



Fig. 7. Clock – Gated D Flip Flop using master – slave latch configuration.



Fig. 8. Transient Analysis of Clock – Gated D Flip Flop.

From the transient response of the gated flip flop shown in fig. 8, it is quite clear that the data input ‘D’ is propagated to ‘Q’ with full swing in reference to the gated clock. Therefore, there is no disruption in logic performance. No glitch is observed during the propagation of ‘D’ to ‘Q’ for both positive-edge and negative edge of gated clock, as the XOR based comparison is done with the master latch output.

#### V. RESULT & ANALYSIS

In this section, we have analyzed the simulation results of gated flip-flop in 90nm, 65nm, 45nm and 32nm PTM

TABLE I. Timing Analysis of Gated Flip-Flops

| Parameters          | DG-FF [4] | NC <sup>2</sup> MOS [4] | Proposed LECTOR Based Gated FF |        |        |        | % Improvement w.r.t. DG-FF |       |       |       | % Improvement w.r.t. NC <sup>2</sup> MOS |       |       |       |
|---------------------|-----------|-------------------------|--------------------------------|--------|--------|--------|----------------------------|-------|-------|-------|------------------------------------------|-------|-------|-------|
|                     |           |                         | 90                             | 65     | 45     | 32     | 90                         | 65    | 45    | 32    | 90                                       | 65    | 45    | 32    |
| PTM Technology (nm) | 800       | 800                     | 90                             | 65     | 45     | 32     | 90                         | 65    | 45    | 32    | 90                                       | 65    | 45    | 32    |
| Setup Time (ns)     | 1.40      | 1.07                    | 0.041                          | 0.031  | 0.0244 | 0.0186 | 97.07                      | 97.80 | 98.25 | 98.67 | 96.16                                    | 97.12 | 97.72 | 98.26 |
| Hold Time (ns)      | -1.04     | -1.01                   | -0.031                         | -0.016 | -0.018 | -0.014 | 97.01                      | 98.46 | 98.25 | 98.70 | 96.93                                    | 98.41 | 98.20 | 98.66 |
| Delay (ns)          | 1.35      | 0.98                    | 0.084                          | 0.083  | 0.051  | 0.039  | 93.73                      | 93.85 | 96.25 | 97.12 | 91.36                                    | 91.53 | 94.84 | 96.31 |
| Latency (ns)        | 2.75      | 2.05                    | 0.1256                         | 0.1145 | 0.0749 | 0.0575 | 95.43                      | 95.83 | 97.27 | 97.91 | 93.87                                    | 94.41 | 96.34 | 97.19 |

TABLE II. Performance comparison of gated flip-flops

| Parameters                          | Proposed LECTOR Based Gated FF |       |       |       | DG-FF [4] | NC <sup>2</sup> MOS [4] | [11] | [14]   |
|-------------------------------------|--------------------------------|-------|-------|-------|-----------|-------------------------|------|--------|
| PTM Technology (nm)                 | 90                             | 65    | 45    | 32    | 800       | 800                     | 90   | RTL    |
| Dynamic Power/GHz ( $\mu\text{W}$ ) | 10.49                          | 9.40  | 7.48  | 11.42 | -----     | -----                   | ---- | 175000 |
| Static Power ( $\mu\text{W}$ )      | 5.64                           | 7.03  | 8.33  | 6.40  | -----     | -----                   | ---- | -----  |
| Average Power ( $\mu\text{W}$ )     | 19.06                          | 20.00 | 18.50 | 19.74 | 563.00    | 361.00                  | 8.00 | -----  |
| Transistors Count                   | 58                             | 58    | 58    | 58    | 42        | 28                      | 300  | -----  |
| Clock Frequency (GHz)               | 1.25                           | 1.25  | 1.25  | 1.25  | 0.05      | 0.05                    | 0.20 | 1.00   |

technology.

#### A. Timing Analysis

A test circuit shown in fig. 9, has been developed in order to estimate the timing parameters of the flip flop. The time interval between ckg and Q\_flip\_flop during the rising transition is considered as propagation delay of the gated flip-flop. The time duration needed for the data on D<sub>b</sub> to get stable before the ckg\_bar arrives at the slave latch is known as setup time. On the other hand, time duration for which the data D<sub>b</sub> remains unchanged till the ckg\_bar switches is the hold time. Latency is calculated as the summation of setup and delay time. In Table I, we have displayed the timing analysis of our proposed design in comparison with the one reported in [4] and [5].



Fig. 9. Test Circuit for timing analysis.

#### B. Power Analysis

The average power dissipated by the proposed gated D Flip Flop is  $19.06\mu\text{W}$  with an operating frequency of  $1.25\text{GHz}$  using PTM 90nm technology respectively. It read a dynamic power and static power of  $10.49\mu\text{W}/\text{GHz}$  and  $5.64\mu\text{W}$  respectively. In Table II, we have also analyzed the power consumption of our proposed logic in 65nm, 45nm and 32nm technology and compared with existing gating logics. Though, there should have been an increase in timing parameters with the incorporation of new gating logic, our design could cease it due to the area compaction in terms of transistor size and process technology.

The proposed clock gated flip flop also provides a stable transient response when it is simulated with a process variation of 1-2% at a temperature of  $27^\circ\text{C}$ . It is quite evident that our proposed design has good performance in terms of power and delay even with the change in process technology as depicted in fig. 10. It is also observed that the proposed gated flip-flop is much lesser prone to power consumption as compared to the existing works.



Fig. 10. Bar Chart of Power and Delay of Proposed Gated Flip-flop

#### VI. CONCLUSION

Clock gating is one of the power saving techniques generally being used starting from Pentium processors to all next generation processors. In this work, we have developed a new clock gating logic incorporating LECTOR to be used mainly in sequential logics of an IC. The proposed logic is simulated and implemented on a D flip-flop using 90nm PTM technology with an operating clock frequency of  $1.25\text{GHz}$ . When compared with 90nm technology, the 65nm approach of proposed logic reads 10.43% saving in dynamic power with a loss of 24.76% and 4.93% on static and average power respectively. There has been a gain of 1.01% in delay and 8.34% in latency as well. It is observed from the timing analysis that the delay and latency decreases with the lower technology node. Hence, the data sampling rate of the proposed gated flip-flop increases with down scaling of process technology.

#### Acknowledgment

We are thankful to DEITY, Govt. of India, for providing the financial grant under Visvesvaraya PhD Scheme and SMDP-C2SD.

#### References

- [1] Hanchate, N. et al. "LECTOR: a technique for leakage reduction in CMOS circuits." *IEEE Trans. on (VLSI) Sys.*, 12(2), 196-205.
- [2] Aanandam SK. "Deterministic clock gating for low power VLSI design" (Doctoral dissertation, National Institute of Technology, Rourkela).
- [3] Srinivasan, Nandita, et al. "Power Reduction by Clock Gating Technique." *Procedia Technology* 21 (2015): 631-635.
- [4] A.G.M. Strollo et al. "New clock-gating techniques for low-power flip-flops." Proc. of the Int. Symposium on Low power electronics and design. ACM, 2000.
- [5] A.G.M. Strollo et al. "Low-power flip-flops with reliable clock gating." *Microelectronics Journal* 32.1 (2001): 21-28.
- [6] Choi, J. H., et al. "Improved clock-gating control scheme for transparent pipeline." Proc. of the 2010 Asia and South Pacific Design Automation Conf., IEEE Press, 2010.
- [7] Shinde, J., et al. "Clock gating—A power optimizing technique for VLSI circuits." (INDICON), 2011 Annual IEEE.
- [8] Kathuria, J., et al. "A review of clock gating techniques." *MIT Int. Journal of Electronics and Comm. Engg.* 1.2 (2011): 106-114.
- [9] V. Preeti, et al. "Leakage power and delay analysis of LECTOR based CMOS circuits." (ICCCCT), IEEE, 2011.
- [10] D.K. Sharma "Effects of different clock gating techniques on design." *Int. Journal of Sci. & Engg. Research* 3.5 (2012): 1-4.
- [11] Shaker, M et al. "Novel clock gating techniques for low power flip-flops and its applications." (56<sup>th</sup> MWSCAS) on. IEEE, 2013.
- [12] H. Li, et al. "DCG: deterministic clock-gating for low-power microprocessor design." *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on* 12, no. 3 (2004): 245-254.
- [13] A.P. Shah, et al. "Comparative study of area, delay and power dissipation for LECTOR and INDEP (leakage control techniques) at 70 nm technology node." (IACC), IEEE, 2015.
- [14] M.P. Dev et.al. "Clock gated low power sequential circuit design." (ICT), IEEE, 2013.
- [15] <http://ptm.asu.edu/>

Received 1 August 2022, accepted 10 September 2022, date of publication 16 September 2022, date of current version 26 September 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3207151



# A Novel Clock Gating Approach for the Design of Low-Power Linear Feedback Shift Registers

GIANLUCA GIUSTOLISI<sup>ID1</sup>, (Senior Member, IEEE), ROSARIO MITA<sup>1</sup>, (Member, IEEE), GAETANO PALUMBO<sup>ID1</sup>, (Fellow, IEEE), AND GIUSEPPE SCOTTI<sup>ID2</sup>, (Senior Member, IEEE)

<sup>1</sup>DIEEI, Università degli Studi di Catania, 95125 Catania, Italy

<sup>2</sup>DIET, Università degli Studi di Roma “La Sapienza,” 00184 Rome, Italy

Corresponding author: Gianluca Giustolisi (gianluca.giustolisi@unict.it)

**ABSTRACT** This paper presents an efficient solution to reduce the power consumption of the popular linear feedback shift register by exploiting the gated clock approach. The power reduction with respect to other gated clock schemes is obtained by an efficient implementation of the logic gates and properly reducing the number of XOR gates in the feedback network. Transistor level simulations are performed by using standard cells in a 28-nm FD-SOI CMOS technology and a 300-MHz clock. Simulation results show a power reduction with respect to traditional implementations, which reaches values higher than 30%.

**INDEX TERMS** Complementary pass-transistor logic (CPL), gated clock, linear feedback shift register (LFSR), low-power design, transmission gate (TG).

## I. INTRODUCTION

Today, linear feedback shift registers (LFSRs) are widely used in many electronics equipment that require very fast generation of a pseudo-random sequence, such as built-in test of digital circuits [1], [2], [3], [4], [5], where the minimization of area, power and delay are the most important figures of merit. LFSRs are also fundamental building blocks in stream ciphers for secure communications used in GSM and LTE applications [6], and in lightweight stream ciphers for embedded systems [7]. Word-based LFSRs were introduced to efficiently use the structure of modern word-based processors. Such LFSRs are used in a variety of stream ciphers, most notably in the SNOW series of stream ciphers [8] and in image encryption applications [9]. LFSRs are also used to generate an approximation of white noise for parameters estimation and system identification purposes [10], and in the Global Positioning System where an LFSR is used to rapidly transmit a sequence that indicates high-precision relative time offsets [11]. LFSRs are also widely used in direct sequence spread spectrum (DSSS) systems [12], and error detection and correction by implementing BCH (Bose, Chaudhuri,

The associate editor coordinating the review of this manuscript and approving it for publication was Artur Antonyan<sup>ID</sup>.



**FIGURE 1.** Generic  $n$ -bit LFSR: Fibonacci or standard configuration (a) and Galois or modular configuration (b).

Hocquenghem) and CRC (cyclic redundancy codes) encoder and decoder circuits [13], [14], [15]. Recently LFSR have been also exploited to build strong physical unclonable functions (PUFs) for cryptographic applications [16], [17].

Hardware implementation of linear feedback shift registers can be obtained by adopting two alternative configurations, both depicted in Fig. 1, each generating the same output bit stream. The configurations are named Fibonacci configuration (Fig. 1a) also known as standard, many-to-one or external XOR gates, and the Galois configuration (Fig. 1b) also known as modular, one-to-many or internal XOR gates [18].

These topologies are very simple to build, but since the clock-path of all flip-flops toggles at every clock cycle, they waste a non-negligible amount of power.

Although the LFSR power consumption has been extensively addressed in literature [19], [20], [21], the proposed solutions reduced power consumption at the cost of an increased circuit complexity, thus obscuring the major advantage of the LFSRs. A gated clock solution to reduce power consumption of the LFSRs has been also proposed by one of the authors in [22], where the analysis demonstrated that the power reduction strongly depends on the technological characteristics of the employed gates.

Moreover, in the same paper it has been found that a relationship involving technology parameters has to be satisfied in order to achieve a power reduction with respect to a traditional (non-gated clock) LFSR. In particular, even if the above relationship involving technology parameters gets satisfied, the maximum power reduction allowed by the approach in [22] with respect to a traditional (non-gated clock) LFSR is below 10%.

In this paper we propose a more efficient gated clock design approach for LFSRs, which greatly reduce power consumption without unduly complicating the traditional simple topology. With respect to other gated clock schemes, the proposed approach allows more power saving, thanks to a power efficient implementation of the logic gates that implement the clock gating network, and by properly reducing the number of XOR gates in the feedback path. Indeed, the proposed approach has resulted in a power reduction that can reach values higher than 30%.

## II. BACKGROUND

### A. LINEAR FEEDBACK SHIFT REGISTER

A linear feedback shift register (LFSR) is a shift register whose input bit is a linear function of its previous state.

By referring to the standard implementation in Fig. 1, LFSR is realized with an array of flip-flops (FFs) with a linear feedback performed by several XOR gates.

The initial value of the LFSR is called the seed, and since the operation of the register is deterministic, the stream of values produced by the register is completely determined by its current state. Although LFSRs are very simple to implement, they are based on a rather complex mathematical theory [23]. However, they can be efficiently described through the  $n^{\text{th}}$ -order polynomial

$$p_c = x^n + c_{n-1}x^{n-1} + \dots + c_1x + 1 \quad (1)$$

where the binary coefficients  $c_i$  ( $i = 1, 2, \dots, n - 1$ ), define the well-known characteristic polynomial ( $p_c$ ), which set the



**FIGURE 2.** Traditional gated clock circuit for FFs without enable signal.



**FIGURE 3.** Gated clock LFSR implementation.

length of the pseudo-random sequence and the other statistical properties of the bit generator.

By defining  $P_{FF}$  and  $P_{XOR}$  the power consumption of the FFs and the XOR gates, respectively, the power consumption of the conventional LFSR in Fig. 1 can be modeled as

$$P_{Conv} = nP_{FF} + n_t\alpha P_{XOR} \quad (2)$$

where  $n$  is the register length (i.e., the order of the generator),  $n_t$  is the number of the inner taps (i.e., the number of the terms of the polynomial characteristic except  $x^n$  and 1),  $\alpha$  is the switching activity at the inner nodes, which, in a LFSR with  $n \geq 6$  and assuming maximum period, is approximately equal to 0.5 [22].

From (2), it appears that for the topologies in Fig. 1 the clock path toggles at every clock cycle, thus dissipating a significant amount of power especially at high clock rates.

Vice versa, power consumption of FF D-path and XOR gates depend on the switching activity and hence its value is reduced by 50% with respect to the maximum value.

### B. DYNAMIC POWER MANAGEMENT

Dynamic Power Management (DPM) is a commonly adopted strategy to reduce power consumption in a digital system. It consists in disabling the logic circuits that are not performing functional operations during a particular time frame.

At circuit level, this strategy is known as “gated clock approach” [24], [25] and, for flip-flops with no enable signal, it consists in their activation only when the input signal is different from the actual output value, according to the scheme depicted in Fig. 2.

A modified LFSR that takes advantage of the gated clock strategy is shown in Fig. 3. The topology reduces the flip-flop

power consumption,  $P_{FF}$ , at the price of additional power consumption due to the extra gates required to implement the gated clock approach.

Therefore, for the gated clock LFSR in Fig. 3, the power consumption in (2) turns into

$$P_{GC} \approx n\alpha P'_{FF} + (n + n_t) \alpha P_{XOR} + n\alpha P_{NAND} \quad (3)$$

where the term  $n \cdot \alpha \cdot P'_{FF}$  represents the dissipation of the FFs with the new load conditions (i.e., the extra XOR gates).

In [22], to further reduce the power consumption of the extra gates, the authors proposed a single CMOS XORNAND gate to drive the clock terminals of the FFs. The power dissipation was estimated in

$$P_{GC[22]} \approx n\alpha P'_{FF} + n_t \alpha P_{XOR} + n\alpha P_{XORNAND} \quad (4)$$

but, the reduction in the overall power dissipation with respect to a traditional (non-gated clock) LFSR was no better than 10%, thus limiting the benefit of the proposed topology.

### III. IMPROVED GATED CLOCK IMPLEMENTATION

#### A. EFFICIENT LOGIC GATE IMPLEMENTATION

Reducing the overall power dissipation can be accomplished by reducing the power consumption of the term  $P_{XORNAND}$  in (4). This can be done by means of the power-aware solution depicted in Fig. 4, which combines the benefits of the complementary pass transistor logic (CPL-XOR/XNOR) with the transmission gate approach (TG-MUX) [26]. It is worth noting that the complementary signals required by the CPL-XOR/XNOR section are easily available as output signals of many FF standard cells. Moreover, the complementary outputs of the CPL-XOR/XNOR section are perfectly tailored to drive the TG-MUX section since they guarantee a full voltage swing at the output node of the XORAND gate without any additional level restoring transistors.

The power consumption of a gated clock LFSR implemented using the XORAND circuit in fig. 4 can be modeled as

$$P_{CPT\_TG} \approx n\alpha P''_{FF} + n_t \alpha P_{XOR} \quad (5)$$

where the power consumption of the gated circuit,  $P_{XORNAND}$ , is virtually eliminated and the FFs power consumption,  $P_{FF}''$ , accounts for the smaller capacitive effects due to both CPL and TG circuits.

#### B. REDUCED XOR NUMBER

To further cut down the LFSR power consumption, we propose an additional strategy to reduce the number of XOR gates in the feedback path,  $n_t$ , by taking advantage of the CPL-XOR/XNOR section in Fig. 4. Indeed, at the output of this CPL gate we have a binomial  $x^{i+1} \oplus x^i$ , with index  $i$  from 0 to  $n - 2$ , which can be used to save XORs in the feedback path. For example, considering the polynomial  $x^7 + x^3 + x^2 + x + 1$ , instead of using three XORs in the feedback path to implement  $x^3 \oplus (x^2 \oplus (x \oplus 1))$ , we can simply do the XOR of the binomials  $x^3 \oplus x^2$  and  $x \oplus 1$  available at the



**FIGURE 4. Power-aware XORAND for gated clock implementation.**

outputs of the CPL gates. Moreover, in case of non-adjacent taps, we can exploit the property  $x^i \oplus x^j = 0$ .

For example, the polynomial  $x^5 + x^2 + 1$ , which needs only one XOR in the traditional topology, can be implemented again with only one XOR whose inputs are the binomials  $(x^2 + x)$  and  $(x + 1)$  available at the outputs of the CPL-XOR/XNOR.

To derive the number of XOR gates required in the feedback network by using the proposed strategy, let us consider the ordered  $m$ -elements array,  $a_i$ , of the taps exponent (for example, for the polynomial  $x^{10} + x^4 + x^3 + x + 1$  the array elements are  $a_1 = 1$ ,  $a_2 = 3$ ,  $a_3 = 4$  and  $a_4 = 10$ ). Then, the number of the XOR required in the feedback network is given by

$$n'_t = a_1 - 1 + \sum_{i=1}^{\frac{m}{2}-1} (a_{2i+1} - a_{2i}) \quad (6)$$

Note that in (6)  $a_1$  is the lowest exponent of the polynomial characteristic, and terms in the sum are couple of close taps exponents, without the highest one.

By inspection of relationship (6), it is apparent that the minimum number of XOR is required when the characteristic polynomial contains the term  $x$ , and all the couple of taps are also adjacent.

Table 1 summarizes the number of XOR gates necessary to implement the feedback circuit of some characteristic polynomials both in the traditional topology,  $n_t$  (i.e., number of the inner taps), and by adopting the proposed strategy,  $n'_t$  evaluated through relationship (6).

If we now focus on Table 1, it is apparent that the proposed strategy does not always need a lower number of XOR gates. Thus, to achieve a further reduction on the number of XOR gates, we can efficiently use together the outputs of the CPL-XOR/XNOR sections (i.e., the terms  $x^{i+1} \oplus x^i$ ), and the terms  $x^i$  at the outputs of the FFs.

Thus, a further reduction on the number of XOR gates in the feedback path is achieved, since it results equal to

$$n''_t = n_t - m_c \quad (7)$$

where  $m_c$  is the number of adjacent taps couples, but considering each tap in only one couple. For example, in the

**TABLE 1.** Number of XORs in the linear feedback path of some LFSRs.

| Polynomial characteristic                                                    | $n_t$ | $n'_t$ | $n''_t$ | $n''_t - n_t$ |
|------------------------------------------------------------------------------|-------|--------|---------|---------------|
| $x^5 + x^2 + 1^{(1)}$                                                        | 1     | 1      | 1       | 0             |
| $x^5 + x^3 + x^2 + x + 1$                                                    | 3     | 1      | 1       | -2            |
| $x^7 + x^3 + 1^{(2)}$                                                        | 1     | 2      | 1       | 0             |
| $x^7 + x^3 + x^2 + x + 1$                                                    | 3     | 1      | 1       | -2            |
| $x^7 + x^5 + x^4 + x^3 + x^2 + x + 1$                                        | 5     | 2      | 2       | -3            |
| $x^{10} + x^3 + 1$                                                           | 1     | 2      | 1       | 0             |
| $x^{10} + x^4 + x^3 + x + 1$                                                 | 3     | 1      | 1       | -2            |
| $x^{10} + x^6 + x^5 + x^3 + x^2 + x + 1$                                     | 5     | 2      | 2       | -3            |
| $x^{10} + x^7 + x^6 + x^5 + x^4 + x^3 + x^2 + x + 1$                         | 7     | 3      | 3       | -4            |
| $x^{16} + x^5 + x^3 + x^2 + 1$                                               | 3     | 3      | 2       | -1            |
| $x^{16} + x^5 + x^4 + x^3 + x^2 + x + 1$                                     | 5     | 2      | 2       | -1            |
| $x^{16} + x^8 + x^7 + x^5 + x^4 + x^3 + x^2 + x + 1$                         | 7     | 3      | 3       | -4            |
| $x^{16} + x^{15} + x^{11} + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1^{(3)}$ | 9     | 9      | 6       | -3            |

(1) Used in CRC-5-USB token packets

(2) Used in CRC-7 Telecom systems, ITU-T G.707, ITU-T G.832, SD/MMC

(3) Used in CRC-16-T10-DIF SCSI DIF

polynomial  $x^{10} + x^4 + x^3 + x + 1$  the couples of adjacent taps,  $m_c$ , are 2, that is, the couples  $(x^4 + x^3)$  and  $(x + 1)$ .

Note that now, unlike for relationship (6), the term  $x^0$  is also included to find the adjacent couples.

By inspection of Table 1, where also the number of XOR gates required by adopting this strategy,  $n''_t$ , is reported, it is apparent that now the number of XOR gates in the feedback path is always lower (or equal) than  $n_t$ , thus providing an overall power reduction on the feedback network contribution.

Finally, it has to be remarked that, in the feedback network, where XOR gates have to drive FFs instead of TGs, it is not convenient to implement XOR gates with the CPL topology. Indeed, the highest output voltage value of CPL is equal to  $V_{DD} - V_m$  (i.e., a weak logical ‘1’).

This value may not be sufficiently high to switch off the PMOS transistors at the input of the FFs, and a static power consumption contribution may arise.

Thus, unless additional transistors to provide level restoring are included, CPL-XOR/XNOR gates in feedback network result inefficient with respect to the traditional CMOS implementation [27].

#### IV. DESIGN EXAMPLES AND SIMULATION RESULTS

We have compared the power consumption among the LFSRs designed with the proposed gated clock approach, with the traditional implementation and with the solution given in [22]. We remark that the proposed approach allows to reduce power consumption without severely affecting the critical path of the circuit and thus without limiting the speed of the serial LFSR, which exhibits the lowest critical path delay among all the LFSR architectures. Recently parallel approaches [13], [14], [15], have been proposed, specifically

**FIGURE 5.** Simplified schematic of the used D-type flip-flop.**FIGURE 6.** Simplified schematic of the speed-optimized XOR gate included in the STM standard-cell library.

targeted for BCH and CRC encoders, but due to the very different architecture, a comparison between the LFSR presented in this paper and these parallel approaches is not fair, therefore we do not include parallel approaches in the comparison.

Specifically, using a commercial 28-nm CMOS FD-SOI technology process in the Cadence simulation environment, we have run several transistor level simulations on the topologies having the characteristic polynomials in Table 1. For the digital blocks, we used the master-slave positive edge triggered D-type Flip-Flop depicted in Fig. 5 and the two-input speed-optimized XOR gate in Fig. 6, both included in a standard threshold voltage, low-power option standard cells library. In addition, for the circuits reported in [26] and in Fig. 4, we used the thin oxide N-type and P-type MOSFETs with low threshold voltage and minimum channel length of 28-nm, included in the same design kit. All circuits have been clocked at 300 MHz and powered at 1 V.

The simulation results of the LFSRs designed with the different approaches are summarized in Table 2. By comparing

**TABLE 2.** Power consumption (Expressed in  $\mu\text{W}$ ) of the simulated LFSRs.

| Polynomial characteristic                                              | Conventional Eq. (2) |                      | Gated Clock [22] Eq. (4)     |                      |                        | Proposed approach Eq. (5)          |                        |
|------------------------------------------------------------------------|----------------------|----------------------|------------------------------|----------------------|------------------------|------------------------------------|------------------------|
|                                                                        | $nP_{FF}$            | $n_t \alpha P_{XOR}$ | $n \alpha P'_{FF}$           | $n_t \alpha P_{XOR}$ | $n \alpha P_{XORNAND}$ | $n \alpha P''_{FF}$                | $n_t'' \alpha P_{XOR}$ |
| $x^5 + x + 1$                                                          | 7.93                 | 0.21                 | 5.98                         | 0.24                 | 0.93                   | 6.39                               | 0.27                   |
|                                                                        | $P_{Conv}=8.14$      |                      | $P_{GC[22]}=7.15 (-12.2\%)$  |                      |                        | $P_{CPL\_TG\_imp}=6.65 (-18.3\%)$  |                        |
| $x^5 + x^3 + x^2 + x + 1$                                              | 8.11                 | 0.798                | 6.17                         | 0.84                 | 0.95                   | 6.34                               | 0.36                   |
|                                                                        | $P_{Conv}=8.91$      |                      | $P_{GC[22]}=7.97 (-10.7\%)$  |                      |                        | $P_{CPL\_TG\_imp}=6.70 (-24.9\%)$  |                        |
| $x^7 + x^3 + 1$                                                        | 11.08                | 0.78                 | 8.22                         | 0.23                 | 1.28                   | 8.73                               | 0.26                   |
|                                                                        | $P_{Conv}=11.99$     |                      | $P_{GC[22]}=10.48 (-12.6\%)$ |                      |                        | $P_{CPL\_TG\_imp}=9.10 (-24.2\%)$  |                        |
| $x^7 + x^3 + x^2 + x + 1$                                              | 11.20                | 0.361                | 8.38                         | 0.82                 | 1.28                   | 8.74                               | 0.35                   |
|                                                                        | $P_{Conv}=11.56$     |                      | $P_{GC[22]}=10.48 (-9.4\%)$  |                      |                        | $P_{CPL\_TG\_imp}=9.10 (-20.5\%)$  |                        |
| $x^7 + x^5 + x^4 + x^3 + x^2 + x + 1$                                  | 11.51                | 1.64                 | 8.56                         | 1.68                 | 1.28                   | 8.80                               | 0.70                   |
|                                                                        | $P_{Conv}=13.14$     |                      | $P_{GC[22]}=11.53 (-12.3\%)$ |                      |                        | $P_{CPL\_TG\_imp}=9.50 (-27.8\%)$  |                        |
| $x^{10} + x^3 + 1$                                                     | 15.78                | 0.201                | 11.69                        | 0.23                 | 1.84                   | 12.51                              | 0.26                   |
|                                                                        | $P_{Conv}=15.98$     |                      | $P_{GC[22]}=13.77 (-13.9\%)$ |                      |                        | $P_{CPL\_TG\_imp}=12.77 (-20.2\%)$ |                        |
| $x^{10} + x^4 + x^3 + x + 1$                                           | 15.94                | 0.77                 | 11.86                        | 0.81                 | 1.81                   | 12.39                              | 0.35                   |
|                                                                        | $P_{Conv}=16.71$     |                      | $P_{GC[22]}=14.49 (-13.4\%)$ |                      |                        | $P_{CPL\_TG\_imp}=12.75 (-23.8\%)$ |                        |
| $x^{10} + x^6 + x^5 + x^3 + x^2 + x + 1$                               | 16.10                | 1.63                 | 12.02                        | 1.66                 | 1.84                   | 12.47                              | 0.70                   |
|                                                                        | $P_{Conv}=17.73$     |                      | $P_{GC[22]}=15.53 (-12.5\%)$ |                      |                        | $P_{CPL\_TG\_imp}=13.18 (-25.8\%)$ |                        |
| $x^{10} + x^7 + x^6 + x^5 + x^4 + x^3 + x^2 + x + 1$                   | 16.59                | 2.84                 | 12.92                        | 2.90                 | 1.79                   | 13.29                              | 0.99                   |
|                                                                        | $P_{Conv}=19.43$     |                      | $P_{GC[22]}=17.62 (-9.4\%)$  |                      |                        | $P_{CPL\_TG\_imp}=14.28 (-26.5\%)$ |                        |
| $x^{16} + x^5 + x^3 + x^2 + 1$                                         | 25.55                | 0.78                 | 18.58                        | 0.81                 | 2.96                   | 19.64                              | 0.70                   |
|                                                                        | $P_{Conv}=26.33$     |                      | $P_{GC[22]}=22.35 (-15.2\%)$ |                      |                        | $P_{CPL\_TG\_imp}=20.35 (-22.8\%)$ |                        |
| $x^{16} + x^5 + x^4 + x^3 + x^2 + x + 1$                               | 25.76                | 1.65                 | 18.79                        | 1.60                 | 2.97                   | 19.68                              | 0.69                   |
|                                                                        | $P_{Conv}=27.41$     |                      | $P_{GC[22]}=23.36 (-14.8\%)$ |                      |                        | $P_{CPL\_TG\_imp}=20.38 (-25.7\%)$ |                        |
| $x^{16} + x^8 + x^7 + x^5 + x^4 + x^3 + x^2 + x + 1$                   | 26.61                | 2.82                 | 20.48                        | 2.84                 | 2.97                   | 19.66                              | 1.02                   |
|                                                                        | $P_{Conv}=29.43$     |                      | $P_{GC[22]}=26.29 (-10.7\%)$ |                      |                        | $P_{CPL\_TG\_imp}=20.69 (-29.7\%)$ |                        |
| $x^{16} + x^{15} + x^{11} + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1$ | 27.03                | 4.39                 | 20.69                        | 4.78                 | 2.97                   | 19.73                              | 2.0916                 |
|                                                                        | $P_{Conv}=31.42$     |                      | $P_{GC[22]}=28.45 (-9.5\%)$  |                      |                        | $P_{CPL\_TG\_imp}=21.83 (-30.6\%)$ |                        |

**TABLE 3.** Area and critical path delay of the 16 bit simulated LFSRs.

| Polynomial characteristic                                              | Conventional LFSR        |            | Gated Clock [22]         |            | Proposed approach        |            |
|------------------------------------------------------------------------|--------------------------|------------|--------------------------|------------|--------------------------|------------|
|                                                                        | Area ( $\mu\text{m}^2$ ) | Delay (ns) | Area ( $\mu\text{m}^2$ ) | Delay (ns) | Area ( $\mu\text{m}^2$ ) | Delay (ns) |
| $x^{16} + x^5 + x^3 + x^2 + 1$                                         | 47.64                    | 0.133      | 66.77                    | 0.133      | 56.68                    | 0.133      |
| $x^{16} + x^5 + x^4 + x^3 + x^2 + x + 1$                               | 49.28                    | 0.154      | 68.41                    | 0.154      | 55.93                    | 0.154      |
| $x^{16} + x^8 + x^7 + x^5 + x^4 + x^3 + x^2 + x + 1$                   | 50.91                    | 0.154      | 70.04                    | 0.154      | 56.36                    | 0.154      |
| $x^{16} + x^{15} + x^{11} + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1$ | 52.55                    | 0.182      | 71.68                    | 0.182      | 59.20                    | 0.182      |

the approach proposed in [22] with respect to the traditional implementation, we note that the power consumption of the FFs is reduced by nearly 25% after applying the clock gated design, but the overall power reduction is only lower than 8% since extra gates are introduced to implement the gated circuit. In other words, the XORNAND gates contribute with 12-18% of the overall power consumption with an inverse dependence on the number of the taps.

On the other hand, as expected, the proposed solution, as reported in Table 2 and plotted in Fig. 7, allows a significant power saving, which is typically higher than 20% and often (especially for higher order polynomials) reaching values around 30%.

Finally, it is worth noting that, unlike the strategy in [22], the overall power saving of the proposed gating approach is proportional to the number of taps. Indeed, by increasing the number of taps, although the capacitive effects of the feedback network also increase, there is, according to (7),

an increased probability to find couples of adjacent taps that reduce the number of the XOR gates.

For area and delay estimation purposes, we have coded in VHDL and synthesized by using the Cadence Genus™ tool the 16 bits LFSRs reported in Table 2, considering both the conventional and the gated clock implementation in [22]. To estimate area and delay of the LFSRs exploiting the approach proposed in this paper, we have implemented also the full custom layout of the power-aware XORNAND circuit in Fig. 4. The area of the LFSRs has then been estimated by summing the area of the standard cells and the area of the power-aware XORNAND exploited in the different 16 bits LFSRs implementations.

Table 3 summarizes the area and critical path delays of the 16 bits LFSRs reported in Table 2, confirming how the proposed approach does not affect the critical path delay, which is, in all cases, set by the feedback path. The area estimations suggest also that the proposed approach results



**FIGURE 7.** Power reduction of the proposed LFSR (x) and solution given in [22] (+) with respect to the traditional implementation.

not only in a significant power consumption saving, but also in a slight area reduction with respect to the approach in [22].

## V. CONCLUSION

An efficient solution to reduce the power consumption of the popular linear feedback shift register has been presented and discussed in detail. The approach uses in some parts CPL design style and benefits from using the gated clock also to implement the feedback network, thus allowing to reduce the number of XOR gates. The proposed design approach has been validated by simulations in a 28 nm CMOS technology and, compared to traditional implementation, has been shown to lead to a power reduction up to 30%, without increasing area and critical path delay.

## REFERENCES

- C. P. de Souza, F. M. de Assis, and R. C. S. Freire, “A new architecture of test response analyzer based on the Berlekamp–Massey algorithm for BIST,” *IEEE Trans. Instrum. Meas.*, vol. 59, no. 12, pp. 3168–3173, Dec. 2010.
- R. Oomen, M. K. George, and S. Joseph, “Study and analysis of various LFSR architectures,” in *Proc. Int. Conf. Circuits Syst. Digit. Enterprise Technol. (ICCSDET)*, Dec. 2018, pp. 1–6.
- M. Mohan and S. S. Pillai, “Review on LFSR for low power BIST,” in *Proc. 3rd Int. Conf. Comput. Methodologies Commun. (ICCMC)*, Mar. 2019, pp. 873–876.
- K.-J. Lee, Z.-Y. Lu, and S.-C. Yeh, “A secure JTAG wrapper for SoC testing and debugging,” *IEEE Access*, vol. 10, pp. 37603–37612, 2022.
- S. V. Murugan and B. Sathiyabhamma, “Retraction note to: Bit-swapping linear feedback shift register (LFSR) for power reduction using pre-charged XOR with multiplexer technique in built in self-test,” *J. Ambient Intell. Hum. Comput.*, pp. 6367–6373, Jul. 2021.
- D. Rupprecht, K. Kohls, T. Holz, and C. Popper, “Breaking LTE on layer two,” in *Proc. IEEE Symp. Secur. Privacy (SP)*, May 2019, pp. 1121–1136.
- C. Manifavas, G. Hatzivasilis, K. Fysarakis, and Y. Papaefstathiou, “A survey of lightweight stream ciphers for embedded systems,” *Secur. Commun. Netw.*, vol. 9, no. 10, pp. 1226–1246, Jul. 2016.
- S. Nandi, S. Krishnaswamy, B. Zolfaghari, and P. Mitra, “Key-dependent feedback configuration matrix of primitive  $\sigma$ -LFSR and resistance to some known plaintext attacks,” *IEEE Access*, vol. 10, pp. 44840–44854, 2022.
- J. Choi and N. Y. Yu, “Secure image encryption based on compressed sensing and scrambling for internet-of-m multimedia things,” *IEEE Access*, vol. 10, pp. 10706–10718, 2022.
- F. M. Mwaniki and H. J. Vermeulen, “Characterization and application of a pseudorandom impulse sequence for parameter estimation applications,” *IEEE Trans. Instrum. Meas.*, vol. 69, no. 6, pp. 3917–3927, Jun. 2020.
- F. Zanier, G. Bacci, and M. Luise, “Criteria to improve time-delay estimation of spread spectrum signals in satellite positioning,” *IEEE J. Sel. Topics Signal Process.*, vol. 3, no. 5, pp. 748–763, Oct. 2009.
- Y. Kim, J. Kim, J. Song, and D. Yoon, “Blind estimation of self-synchronous scrambler using orthogonal complement space in DSSS systems,” *IEEE Access*, vol. 10, pp. 66522–66528, 2022.
- G. Hu, J. Sha, and Z. Wang, “High-speed parallel LFSR architectures based on improved state-space transformations,” *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 25, no. 3, pp. 1159–1163, Mar. 2017.
- X. Zhang, “A low-power parallel architecture for linear feedback shift registers,” *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 66, no. 3, pp. 412–416, Mar. 2019.
- X. Zhang and Z. Xie, “Efficient architectures for generalized integrated interleaved decoder,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 10, pp. 4018–4031, Oct. 2019.
- S. Hou, Y. Guo, and S. Li, “A lightweight LFSR-based strong physical unclonable function design on FPGA,” *IEEE Access*, vol. 7, pp. 64778–64787, 2019.
- S. Baek, G.-H. Yu, J. Kim, C. T. Ngo, J. K. Eshraghian, and J.-P. Hong, “A reconfigurable SRAM based CMOS PUF with challenge to response pairs,” *IEEE Access*, vol. 9, pp. 79947–79960, 2021.
- M. Goresky and A. M. Klapper, “Fibonacci and Galois representations of feedback-with-carry shift registers,” *IEEE Trans. Inf. Theory*, vol. 48, no. 11, pp. 2826–2836, Nov. 2002.
- M. E. Hamid and C. H. Chen, “A note to low-power linear feedback shift registers,” *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 45, no. 9, pp. 1304–1307, Sep. 1998.
- R. S. Katti, X. Ruan, and H. Khatri, “Multiple-output low-power linear feedback shift register design,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 53, no. 7, pp. 1487–1495, Jul. 2006.
- D. S. Mehta, V. Mishra, Y. K. Verma, and S. K. Gupta, “A hardware minimized gated clock multiple output low power linear feedback shift register,” in *Advances in VLSI, Communication, and Signal Processing*. Singapore: Springer, 2020, pp. 367–376.
- W. Aloisi and R. Mita, “Gated-clock design of linear-feedback shift registers,” *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 55, no. 6, pp. 546–550, Jun. 2008.
- R. David, *Random Testing of Digital Circuits. Theory and Application*. New York, NY, USA: Marcel Dekker, 1998.
- L. Benini, A. Bogliolo, and G. De Micheli, “A survey of design techniques for system-level dynamic power management,” *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 8, no. 3, pp. 299–316, Jun. 2000.
- G. Palumbo, F. Pappalardo, and S. Sannella, “Evaluation on power reduction applying gated clock approach,” in *Proc. ISCAS*, May 2002, pp. IV85–IV88.
- J. M. Rabay and M. Pedram, *Low Power Design Methodologies*. Boston, MA, USA: Kluwer, 1997.
- R. Zimmermann and W. Fichtner, “Low-power logic styles: CMOS versus pass-transistor logic,” *IEEE J. Solid-State Circuits*, vol. 32, no. 7, pp. 1079–1090, Jul. 1997.



**GIANLUCA GIUSTOLISI** (Senior Member, IEEE) was born in Catania, Italy, in 1971. He received the Laurea degree (*summa cum laude*) in electronic engineering and the Ph.D. degree in electrical engineering from the University of Catania, in 1995 and 1999, respectively.

Since 2003, he has been teaching courses on electronic devices and analog electronics in undergraduate and postgraduate degrees. Currently, he is an Associate Professor at the Dipartimento di Ingegneria Elettrica Elettronica e Informatica (DIEEI), University of Catania. He is the author of more than 100 scientific papers in refereed international journals and conferences. He is also the author of the Italian course-book *Introduzione ai Dispositivi Elettronici* (Franco Angeli). His research interests include analog circuits with particular emphasis on feedback circuits, compensation techniques, voltage regulators, bandgap voltage references, low-voltage circuits, and device modeling.