

# Design and Analysis of a Low Power Conditional Capture Flip Flop in 32 nm CMOS

Hanna Kaibni

Department of Computer Engineering  
Birzeit University  
Ramallah, Palestine  
1220214

Jamil Habash

Department of Computer Engineering  
Birzeit University  
Ramallah, Palestine  
1220194

Abdallah Kokash

Department of Computer Engineering  
Birzeit University  
Ramallah, Palestine  
1220116

**Abstract**—This paper presents a low-power Conditional-Capture Flip-Flop (CCFF) incorporating explicit clock gating to reduce unnecessary clock switching activity. The proposed design enables the clock only when the input data differs from the stored output, using a clock-gating condition based on the logical XOR between data and output. Designed and evaluated in a 32 nm PTM low-power CMOS technology at 1 V supply, the flip-flop achieves significant clock and total dynamic power reduction compared to a conventional master-slave flip-flop and a reference CCFF without clock gating. Pre-layout and post-layout simulations demonstrate the effectiveness of the proposed architecture, with detailed timing, power, and area analysis.

**Index Terms**—component, formatting, style, styling, insert

## I. INTRODUCTION

Clocking accounts for a significant portion of the total power consumption in modern synchronous digital systems. As technology scales and clock frequencies increase, reducing clock-related power has become critical. Conventional master-slave flip-flops toggle internal nodes and clocked transistors every cycle regardless of data activity, leading to unnecessary dynamic power dissipation.

Conditional-Capture Flip-Flops (CCFFs) have been proposed to mitigate this issue by suppressing internal switching when the input data remains unchanged. However, many CCFF implementations still allow the clock signal to toggle unconditionally, limiting achievable power savings. In this work, we enhance a CCFF by incorporating explicit clock gating, such that the clock is activated only when a state update is required.

The main contributions of this paper are:

- 1- A clock-gated CCFF architecture using a simple data-dependent enable condition.
- 2- Transistor-level implementation and evaluation in 32 nm CMOS.
- 3- Comprehensive pre-layout and post-layout analysis and comparison with reference flip-flop designs.

## II. PROPOSED CCFF ARCHITECTURE AND DESIGN

### A. Motivation and Problem Statement

Conventional master-slave flip-flops capture data at every active clock edge, resulting in unnecessary internal switching activity even when the input data remains unchanged  $D = Q$ . This redundant switching leads to increased dynamic power

consumption, which becomes particularly inefficient in low-activity or data-correlated workloads.

In modern CMOS technologies, dynamic power is dominated by clock switching and the associated internal node toggling within sequential elements. Moreover, many practical digital systems exhibit a high proportion of consecutive clock cycles in which the stored data does not change. Under such conditions, unconditional clocking wastes energy without contributing to useful computation.

Conditional-capture techniques exploit this observation by enabling state updates only when the input data differs from the stored output. By preventing redundant internal transitions, conditional capture statistically reduces clock-related power dissipation while preserving correct functionality.

The key idea of this work is to employ data-dependent control logic that disables internal switching when  $D = Q$ . This approach achieves significant clock power reduction while maintaining acceptable timing performance and modest area overhead, making it well suited for low-power synchronous designs.

### B. Proposed Architecture



Fig. 1. Block Diagram of Gate-Clocking Conditional Capture Flip Flop

The proposed clock-gated Conditional-Capture Flip-Flop (CCFF) architecture. The design consists of three main components: a data-dependent clock-gating logic, a conditional-capture latch core, and a static output latch.

The clock-gating logic generates a gated clock signal by logically ANDing the system clock with a data-activity condition derived from the exclusive-OR (XOR) of the input data  $D$  and the stored output  $Q$ . When  $D = Q$  no state update is required and the gated clock remains disabled. When  $D \neq Q$

the gated clock follows the system clock, allowing the flip-flop to capture new data. This ensures that clock transitions and internal switching occur only when necessary.

The CCFF core employs a dynamic internal node X that is conditionally precharged and evaluated based on the gated clock. During the inactive phase of the gated clock, the internal node is precharged, while during the active phase, it is selectively discharged depending on the data input. This conditional operation suppresses redundant transitions on internal nodes when no data change occurs, thereby reducing dynamic power consumption.

The output stage consists of a static latch that stores the evaluated data and provides complementary outputs Q and QB. This latch ensures robust state retention and isolates the dynamic internal node from load variations, improving noise immunity and output stability.

By combining explicit clock gating with conditional internal evaluation, the proposed architecture reduces both clock-network switching activity and internal node toggling. Compared to conventional CCFF designs that rely solely on internal conditional discharge, the proposed approach further suppresses unnecessary clock transitions at the flip-flop level, leading to enhanced power efficiency with modest additional logic overhead.



Fig. 2. Gated-Clocking Conditional Capture Flip Flop

### C. Comparison with Conventional Designs

In a conventional master–slave flip-flop, both the master and slave latches are driven directly by the clock and its complement, resulting in internal switching activity at every clock edge regardless of data activity. This unconditional clocking leads to significant dynamic power dissipation, particularly in low-data-toggle scenarios.

Reference CCFF designs reduce internal switching by conditionally enabling evaluation paths based on data changes; however, the clock signal itself continues to toggle unconditionally. As a result, clock-related power remains a dominant component of the total power consumption.

In contrast, the proposed clock-gated CCFF suppresses unnecessary clock transitions by explicitly disabling the clock when  $D = Q$ ; this dual-level suppression at both the clock and internal node levels provides greater power savings compared

to both master–slave flip-flops and reference CCFF architectures, while maintaining comparable timing performance and area efficiency.

## III. PRE-LAYOUT SIMULATION AND ANALYSIS

### A. Transistor-Level Netlist and Simulation Setup

The proposed clock-gated conditional-capture flip-flop (CCFF) was validated via transistor-level simulations in 32-nm low-power CMOS using BSIM4 models to capture short-channel and leakage effects. Simulations were performed under typical-typical (TT) process conditions at 1.0 V and 27 °C. A 24 ns, 50% duty-cycle clock with 1 ns rise/fall times was applied, along with pulsed data inputs to test both transition and no-transition scenarios. The output node Q was loaded with 20 fF, and a simulation timestep of 50 ps ensured accurate timing extraction.

### B. Timing Analysis

Transient simulations yielded key timing parameters:  $\text{clk} \rightarrow Q$  delays of 32.6 ns (rising) and 51.9 ns (falling), reflecting PMOS/NMOS drive imbalance and keeper-device effects. The CCFF exhibits a negative setup time of -32 ns, enabling correct data capture even when the data transition follows the clock edge, demonstrating inherent time-borrowing. The hold time is 51 ns, governed by the gated-clock window and internal discharge paths. These results align with previously reported CCFF and pulse-triggered flip-flops without compromising functionality.

### C. Power Analysis

Average power was measured over multiple clock cycles. The CCFF dissipates 1.23  $\mu\text{W}$  at 1.0 V, with negligible clock power (tens of femtowatts) due to the integrated clock-gating logic. This confirms that redundant clock transitions are effectively suppressed, significantly reducing dynamic power, particularly under low-data-activity conditions.

### D. Simulation Results Summary



Fig. 3. Gated-Clocking Conditional Capture Flip Flop Schematic Simulation

TABLE I  
PRE-LAYOUT TIMING AND POWER RESULTS

| Parameter             | Value               |
|-----------------------|---------------------|
| Technology            | 32 nm CMOS (PTM LP) |
| Supply Voltage        | 1.0 V               |
| Temperature           | 27 °C               |
| Setup Time            | -32 ns              |
| Hold Time             | 51 ns               |
| Clk-to-Q Delay (Rise) | 32.6 ns             |
| Clk-to-Q Delay (Fall) | 51.9 ns             |
| Average Power         | 1.23 $\mu$ W        |
| Clock Power           | $\approx$ 0         |

#### IV. PROPOSED CCFF LAYOUT

##### A. Physical Layout and Parasitic Extraction Methodology



Fig. 4. Gated-Clocking Conditional Capture Flip Flop Layout

The physical layout was implemented in the Electric VLSI Design System using a 32-nm CMOS technology with a layout scale factor of 16. The layout was constructed with a strong emphasis on minimum-area realization and compact device placement. Functionally related transistors were placed close to each other to shorten interconnect lengths and reduce the total amount of metal routing, which directly lowers interconnect resistance and capacitance.

To further reduce parasitics, shared diffusion regions were used extensively between adjacent transistors, minimizing diffusion breaks, contact count, and junction capacitance. In addition, shared polysilicon gates were employed for devices driven by the same control signal, reducing redundant poly routing and lowering both polysilicon resistance and gate-related coupling capacitances. These layout techniques significantly reduced local RC parasitics compared to a non-optimized placement.

Resistive parasitics at diffusion–metal and metal–metal interfaces were reduced by inserting multiple contacts and parallel vias wherever permitted by the 32-nm design rules. This lowered effective contact and via resistance, especially on high-activity and clock-driven nodes. Routing was performed using the shortest possible metal paths, with wider-than-minimum metal applied selectively on critical nets to further reduce series resistance.

Parasitic extraction was performed using Electric's built-in extractor, generating a netlist that includes lumped diffusion, polysilicon, and metal resistances, along with ground and coupling capacitances. The extracted netlist was simulated in

SPICE to evaluate layout-induced delay and parasitic loading effects under realistic operating conditions.

##### B. Layout breakdown

The total layout area is divided into active device regions and interconnect-dominated regions. The active regions, consisting of diffusion, polysilicon, and local Metal-1 routing, implement the core logic elements such as inverters, transmission-gate structures, and keeper circuitry. These transistor-dense blocks are clustered mainly in the central and left portions of the layout and occupy most of core area. Metal-1 (blue) is used predominantly within these regions for short intra-cell connections between transistor terminals and nearby contacts.

The remaining area is consumed by interconnect structures and vertical layer transitions. Metal-2 (purple) is used for medium-range routing between logic blocks and for internal signal distribution, providing lateral connectivity while relieving congestion on Metal-1. Metal-3 (yellow) is employed for long-distance and high-fanout routing, including global signal paths and segments of the power distribution network, due to its lower sheet resistance and suitability for wide, low-RC traces.

An upper routing layer, Metal-4 (blue striped), is selectively used for global interconnect crossings and over-the-block routing to avoid interference with lower-layer signal paths. This multilayer routing strategy enables regular signal flow, reduces parasitic resistance on critical nets, and supports efficient area utilization while maintaining full compliance with the 32-nm, scale-16 design rules.

##### C. Design rules and technology node constraints

All layout geometries strictly comply with the 32-nm CMOS design rules scaled by a factor of 16, including minimum feature size, minimum spacing, enclosure, and overlap requirements for diffusion, polysilicon, and metal layers. Minimum-width and minimum-spacing constraints were enforced for all routing layers, while wider-than-minimum metal segments were selectively used for high-fanout, high-current, and timing-critical nets to reduce resistive voltage drops and electromigration risk. Poly-to-diffusion spacing, well-enclosure rules, and active-to-active separation constraints were carefully respected to prevent leakage, latch-up, and reliability-related failures.

The final layout was verified using design rule checking (DRC) and layout-versus-schematic (LVS) to ensure full rule compliance and topological equivalence with the schematic implementation.

## V. CCFF SIMULATION (POST LAYOUT)

### A. Layout Simulation Results



Fig. 5. Gated-Clocking Conditional Capture Flip Flop Layout Simulation

The simulated waveforms demonstrate the correct operation of the clock-gated conditional-capture flip-flop (CCFF). When the input data D differs from the stored output Q, the XOR-based gating logic asserts the gated clock V<sub>gc</sub>, allowing the flip-flop to capture the new data on the active clock edge. As a result, Q updates accordingly while the complementary output Q<sub>b</sub> switches inversely. In contrast, when D=Q, the gated clock remains suppressed, preventing unnecessary internal switching and clock activity. This behavior confirms that the CCFF successfully reduces redundant clock toggling while preserving correct data capture functionality.

### B. Post layout timing and power results

TABLE II  
POST-LAYOUT TIMING AND POWER RESULTS

| Parameter             | Value               |
|-----------------------|---------------------|
| Technology            | 32 nm CMOS (PTM LP) |
| Supply Voltage        | 1.0 V               |
| Temperature           | 27 °C               |
| Setup Time            | -32.0 ns            |
| Hold Time             | 51.0 ns             |
| Clk-to-Q Delay (Rise) | 33.0 ns             |
| Clk-to-Q Delay (Fall) | 52.0 ns             |
| Average Power         | 0.642 $\mu$ W       |
| Clock Power           | $\approx$ 0         |

A comparison between pre-layout and post-layout results shows that the proposed clock-gated CCFF maintains nearly identical timing characteristics while achieving a substantial reduction in power after parasitic extraction. The setup time remains unchanged at -32 ns, and the hold time is preserved at 51 ns, indicating that layout parasitics do not degrade the functional timing margins. The clk-to-Q delay increases only slightly from 32.6 ns to 33.0 ns for the rising transition and from 51.9 ns to 52.0 ns for the falling transition, reflecting a minimal impact from interconnect resistance and capacitance.

In contrast, the average power consumption improves significantly after layout, dropping from 1.23  $\mu$ W in pre-layout to 0.642  $\mu$ W in post-layout simulation. This reduction is attributed to a more realistic modeling of parasitic effects and

the effectiveness of the clock-gating scheme in suppressing unnecessary switching activity. In both cases, the measured clock power remains approximately zero, confirming that the gated clock successfully eliminates redundant clock transitions. Overall, the close agreement in timing and the improved post-layout power figures validate the robustness and low-power efficiency of the proposed CCFF implementation.

### C. Analysis of performance degradation and margin assessment

Post-layout parasitic extraction introduces only minor performance degradation, indicating that the layout is electrically robust. The clk $\rightarrow$ Q rise delay increases from 32.6 ns (pre-layout) to 33.0 ns (post-layout), i.e., +0.4 ns (+1.23%), while the clk $\rightarrow$ Q fall delay increases from 51.9 ns to 52.0 ns, i.e., +0.1 ns (+0.19%). This small delay penalty is consistent with added interconnect RC from routed metals, vias, and diffusion capacitances, and it suggests that critical nets were routed efficiently with limited series resistance impact.

From a margin assessment perspective, the setup time (-32 ns) and hold time (51 ns) remain unchanged pre- vs. post-layout, which implies that timing slack is preserved and that the design's functional window is not tightened by extracted parasitics. In other words, although clk $\rightarrow$ Q slightly degrades, the capture conditions (setup/hold constraints) remain stable, so the flip-flop retains essentially the same timing margins under realistic post-layout loading.

## VI. INNOVATION, RESULTS AND COMPARISON

### A. Comparison With Prior Work

Table III summarizes a comparison between the proposed clock-gated CCFF and three representative low-power flip-flop designs reported in the literature: the original CCFF [1], the conditional-discharge flip-flop (CDFF) [2], and a representative pulse-triggered flip-flop (P-FF) [3]. The comparison focuses on clock power, total dynamic power, area, setup time, hold time, and clock-to-Q (clk $\rightarrow$ Q) delay. The proposed design is implemented and evaluated in a 32-nm PTM low-power CMOS technology at 1.0 V. Post-layout simulations show an average power consumption of 0.642  $\mu$ W with negligible clock power, a negative setup time of -32 ns, a hold time of 51 ns, and clk $\rightarrow$ Q delays of 33.0 ns (rise) and 52.0 ns (fall). In contrast, the original CCFF architecture reported in [1] achieves statistical power reduction by suppressing redundant internal transitions but still incurs unconditional clock toggling, leaving clock power as a dominant component of total power. Although it exhibits negative setup time and low data-to-output latency, its clock network activity fundamentally limits further power scaling.

The CDFF design in [2] further reduces internal switching activity by conditionally disabling the discharge path of high-activity internal nodes, leading to reported energy savings of up to 39% at comparable delay and the smallest power-delay product (PDP) among several pulsed flip-flops. However, CDFF still relies on unconditional clock delivery and therefore does not eliminate redundant clock transitions. Moreover, its

TABLE III  
COMPARISON OF THE PROPOSED CLOCK-GATED CCFF WITH A CONVENTIONAL MASTER-SLAVE FF AND PRIOR CCFF DESIGNS

| Parameter                 | Proposed GC-CCFF    | Master-Slave FF (This Work) | CCFF [1]           | CDFF [2]           | P-FF [3]        |
|---------------------------|---------------------|-----------------------------|--------------------|--------------------|-----------------|
| Technology                | 32-nm CMOS (PTM LP) | 32-nm CMOS (PTM LP)         | 0.35- $\mu$ m CMOS | 0.18- $\mu$ m CMOS | 90-nm CMOS      |
| $V_{DD}$                  | 1.0 V               | 1.0 V                       | 3.3 V              | 1.8 V              | 1.0 V           |
| Clock Power               | $\approx 0$         | 21.24 nW                    | Non-zero           | Non-zero           | Low             |
| Total Dynamic Power       | 0.642 $\mu$ W       | 7.46 $\mu$ W                | Reduced vs. MS     | Lowest among peers | Low             |
| Energy per Operation      | 15.4 fJ             | 178.9 fJ                    | Not Recorded       | Not Recorded       | Not Recorded    |
| Area                      | Compact             | Conventional                | Moderate           | Moderate           | Larger          |
| Setup Time                | -32 ns              | -32 ns                      | Negative           | Negative           | $\sim$ 50-70 ps |
| Hold Time                 | 51 ns               | 51 ns                       | Moderate           | Moderate           | Long            |
| Clk $\rightarrow$ Q Delay | 33-52 ns            | 48-72 ns                    | Low                | Very low           | Minimum         |

energy and delay results are reported in a 0.18- $\mu$ m CMOS technology at 1.8 V, which complicates direct numerical comparison with deeply scaled designs.

The pulse-triggered FF (P-FF) in [3] demonstrates high speed and favorable PDP through conditional pulse enhancement, achieving short minimum D-to-Q delays and competitive clock tree power. Nevertheless, P-FFs inherently suffer from extended hold-time requirements due to the pulse-based capture mechanism and require careful pulse-width control to ensure functional robustness across process corners. In addition, explicit or implicit pulse-generation circuitry introduces extra design complexity and sensitivity to variation.

### B. Energy and Power-Area Efficiency

From the post-layout results, the proposed CCFF dissipates 0.642  $\mu$ W at a 24-ns clock period, corresponding to an energy per operation of approximately 15.4 fJ. Owing to the negligible clock power, this energy is dominated by data-dependent switching only. When combined with the compact layout implementation, this yields a favorable power $\times$ area product relative to prior CCFF-based and pulse-triggered designs, in which either clock power or pulse-generation overhead contributes significantly to total energy.

### C. Discussion of Advantages and Limitations

The proposed architecture provides dual-level suppression of unnecessary activity: internal node evaluation is conditionally enabled, and the clock is gated when  $D = Q$ . This achieves near-zero clock power and substantial total dynamic power reduction compared with conventional CCFFs and CDFFs, while preserving the negative setup time advantage. Compact layout techniques, including shared diffusion and parallel contacts, further reduce parasitic RC and area overhead.

A limitation is the relatively long clk $\rightarrow$ Q delay and hold time in the nanosecond range, due to conservative transistor sizing, gated-clock buffering, and dynamic-node keeper devices. While acceptable for low-frequency or energy-constrained applications, these characteristics may limit use in high-performance pipelines. Additionally, XOR-based gating introduces modest logic overhead and requires careful verification to prevent glitch-induced clock enabling.

Overall, the proposed clock-gated CCFF offers superior clock-power suppression and energy efficiency with minimal

timing degradation, making it well suited for low-activity, energy-constrained workloads.

## VII. CONCLUSION AND FUTURE WORK

This paper presented a low-power clock-gated Conditional-Capture Flip-Flop (CCFF) implemented in 32-nm PTM CMOS. By combining data-dependent conditional capture with XOR-based clock gating, the design effectively suppresses unnecessary internal switching and redundant clock transitions, achieving near-zero clock power while maintaining correct functionality and robust timing.

Transistor-level pre- and post-layout simulations show significant reductions in total dynamic power compared with conventional master-slave and reference CCFF designs. Post-layout results confirm an average power of 0.642  $\mu$ W at 1.0 V, negligible clock power, a negative setup time of -32 ns, and stable hold and clk $\rightarrow$ Q delays, indicating that layout parasitics do not compromise performance.

Compared with prior CCFF, CDFF, and pulse-triggered flip-flops, the proposed CCFF offers superior clock-power suppression and favorable energy efficiency, especially under low-activity or data-correlated workloads. Although the absolute clk $\rightarrow$ Q delay and hold time are in the nanosecond range, they remain acceptable for low-frequency, energy-constrained applications.

Future work includes optimizing transistor sizing and keeper strength to reduce delays, exploring alternative clock-gating conditions with lower logic overhead, evaluating robustness under process, voltage, and temperature (PVT) variations, and integrating the CCFF into larger synchronous blocks. Adapting the design to advanced nodes and standard-cell libraries will further validate its applicability in modern low-power digital systems.

## REFERENCES

- [1] K. Kang, J. T. Kim, and S. O. Jung, "Conditional-capture flip-flop for statistical power reduction," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 16, no. 10, pp. 1335-1344, Oct. 2008.
- [2] A. Mahmood, M. Bayoumi, and A. K. El-Maleh, "High-performance and low-power conditional discharge flip-flop," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 12, no. 5, pp. 477-484, May 2004.
- [3] S. Roy and S. C. Prasad, "Low-power pulse-triggered flip-flop design with conditional discharge technique," *International Journal of VLSI Design & Communication Systems*, vol. 5, no. 4, pp. 1-10, Aug. 2014.