

# A Static Contention-Free Differential Flip-Flop in 28nm for Low-Voltage, Low-Power Applications

Gicheol Shin, Eunyoung Lee, Jongmin Lee, Yongmin Lee, Yoonmyung Lee  
 Sungkyunkwan University, Suwon, Korea  
 gcshin1127@skku.edu

**Abstract** — A Static Contention-free Differential Flip-Flop (SCDFF) is presented in 28nm CMOS for low voltage and low power applications. The SCDFF offers fully static and contention-free operation without redundant internal clock toggling with footed differential latches, while keeping same area with conventional transmission-gate flip-flop (TGFF). The fully static and contention-free operation allows high variation tolerance at low supply voltage regime, achieving wide-range voltage scalability (1V to 0.3V). Measurement results with test chip fabricated in 28nm CMOS technology show that power consumption is reduced by 64%/56% with 0%/10% activity at 1V, compared to the TGFF. All 100 dies from 5 process corners were functional with supply voltage as low as 0.28V.

**Keywords** — flip-flop, static, contention-free, variation tolerant

## I. INTRODUCTION

Near-threshold voltage (NTV) computing, where the supply voltage is approximately equal to the transistor threshold voltage, offers a promising approach to achieve high energy efficiency. However, its voltage scaling is limited by its sensitivity to PVT variations [1-4]. At NTV, dynamic nodes are highly susceptible to variations, and ratioed logic results in poor robustness. Thus to achieve energy-efficient and robust operation at NTV, the following attributes are required for sequential circuits: 1) fully static operation, 2) contention-free transitions, 3) no redundant internal toggling to minimize power consumption and 4) minimum area overhead. To date, reported flip-flop (FF) designs with single phase clock [5-9] experience residual internal clock toggling, and/or functional failure due to lack of attributes 1 or 2. For example, a differential type FF [5] completely removes redundant toggle but suffers from contention. In this work, a Static Contention-free Differential Flip-Flop (SCDFF) is proposed to remove contention from differential type FF with footer/header and ground/supply bridge, satisfying fully static operation and contention-free transitions while maintaining no redundant internal toggling and minimum area overhead.

## II. FLIP-FLOP DESIGN METHODOLOGY

### A. Reliable Low Power Latch Design

Fig.1 shows different types of negative latch structures and their characteristics. A Conventional Transmission Gate (TG) latch is fully static and contention-free, offering good voltage-scalability down to low voltage regimes. However, its redundant internal clock (CKB) and 6 effective clock-loaded transistors result in high power consumption. To reduce clock-related power consumption, a Differential Latch (DL) structure was



Fig.1: Characteristics of conventional TG latch, differential latch, adaptive coupling latch and proposed footed latches.



Fig.2: Cause of failures in ACFF and schematic of SCDFF and its operation

proposed [5] where data input and its complement are exploited to remove CKB. This removes all redundant internal clock toggling, significantly reducing power consumption. But a conventional Differential Latch suffers from strong contention between retention and driving circuitry during data write operation, which makes it susceptible to PVT variations. An Adaptive-Coupling Latch (ACL) [5] was proposed to weaken state retention during data write, making the contention weaker and the latch more tolerant to variations. However, due to weak state retention with NMOS pull-up or PMOS pull-down, state retention nodes (DN, DI) are driven with Vth drop, making it not fully static and still susceptible to variations under low supply voltage. To address this issue, a Footed Latch is proposed that cuts off the pull-down path in back-to-back inverters with footers controlled by the data input instead of weakening the retention. If the next data to be latched is different than the current data, footers cut off undesired pull-down paths to prevent contention. However, such pull-down cut-off remains active during the retention phase (CK=1), resulting in floated storage nodes (DI/DN) when D/DB input switches from the original value. To prevent this, a ground bridge is placed to bridge the 2 footers, as shown on the right in Fig.1, guaranteeing static pull-down on DI/DN. Note that one of the footers is always on due to their complementary control (D/DB). Such implementation makes the proposed fully Static Footed Differential Latch (SFDL) robust against PVT variations and aggressively voltage-scalable.

### B. Reliable Low Power Flip-Flop Design

A SCDFF is designed with 2 complementary SFDLs, and Fig.2 shows the comparison between a SCDFF and an Adaptive-Coupling FF (ACFF) [5]. An ACFF can be made with 2 combinations: ACL + DL (22T ACFF) or ACL + ACL (26T ACFF). While 22T ACFF suffers from strong contention on the slave latch during data write, the 26T ACFF suffers from weak retention with Vth drop on the slave latch as described earlier (Fig.2). Moreover, diffusion connection between the master and

slave latch through the clock transistor can potentially allow the slave latch to upset the stored value in the master latch. In contrast, the SFDLs in the SCDFF are connected without diffusion connection, and achieve fully static and contention-free operation. Fig.2 also shows the operation of the SCDFF when the next data (D) and current data ( $Q_{prev}$ ) are different. When CK=0, depending on the DI/DN value, one of the back-to-back inverter's pull-up path is cut off by the header. Current '1' state at  $Q_1/Q_N$  is still pulled up through the supply-bridging transistor. As CK rises, for the storage node that holds '1' ( $Q_N$  or  $Q_1$ ), pull-up through the bridge is now cut off while pull-down through the input NMOS stack is activated, resulting in contention-free state toggle. The master latch operates in a similar manner because it is in dual form of the slave latch.



Fig.3: Characteristics summary, voltage scalability of state-of-the-art flip-flops

### III. MEASUREMENT RESULTS

#### A. Flip-Flop Characteristics Comparison

The characteristics and voltage scalability of the state-of-the-art FFs are shown in Fig.3. Many of these FFs suffer from high power consumption due to redundant internal clocks [6,7] or limited variation tolerance due to contention [5,7,8] or dynamic state nodes [9]. Details on reliability issues of CSFF[9], TCFF[8] and 18TSPC[7] is also shown on the right hand side of Fig.3. The proposed SCDFF is the only FF with fully-static, contention-free operation without redundant internal clock toggling, allowing low power operation with aggressive voltage scaling.

To examine voltage scalability, 10k Monte Carlo simulations (global+local) were conducted under each operating condition to check functionality. For sizing, there is no single optimum sizing strategy that can be used for all FFs at NTV. To make this a fair comparison, a similar method was used as was reported in [8] with standard transistor sizes ( $0.2\mu$  PMOS and  $0.1\mu$  NMOS). The results show that TGFF, S2CFF, and SCDFF are robust, maintaining their function down to  $\sim 0.3V$ , while even at full VDD, the other FFs fail as a result of contention or dynamic state nodes. At super-threshold voltages, reliability of FFs with unreliable characteristics can be enhanced by transistor sizing at the cost of higher power consumption, as can be seen in Fig.6. But as the voltage is scaled down to NTV, impact of transistor mismatch becomes extremely large [1-2] and only FFs with reliable characteristics can function.

#### B. Test Chip Implementation

To evaluate TGFF, S2CFF, SCDFD and ACFF, a test chip is fabricated in 28nm LP process as shown in Fig.4. The same transistor sizes are used as described for earlier simulations; however, ACFF is aggressively sized to avoid slave latch contention: NMOS with contention is sized up by 3x, providing functionality down to 0.6V in the Monte Carlo simulations



Fig.4: Die Micrograph



Fig.5: Power, energy and minimum functional voltage measured at room temperature

(Fig.6). To test for robustness across process variation, 100 dies, 20 from each corner (NN, SF, FS, SS, FF), are measured.

#### C. Power / Reliability Measurement

Fig.5 shows the measured total power, energy and minimum operating voltages of the FFs. Thanks to its differential structure without redundant clocking, SCDFD achieves a clock power reduction of 64.3%/65.8% compared with TGFF with 0% activity ratio ( $\alpha$ ) and 1V/0.4V supply voltage, respectively. Meanwhile, 58.9% power reduction compared with TGFF is observed for ACFF for 1V supply, but it lost function at 0.4V. At  $\alpha=10\%$ , which is a typical operating condition, SCDFD provides 55.8%/56.7% power reduction compared with TGFF with 1V/0.4V. Under the same  $\alpha=10\%$ , ACFF's power consumption at 1.0V is higher than that of TWCFD due to the upsized transistor. At  $<0.46V$ , ACFF cannot function even with aggressive sizing due to the presence of contention. This result demonstrates that sizing is not sufficient to overcome contention due to prominent variation at NTV. In contrast, TGFF, S2CFF and SCDFD are all able to function at 0.3V, and SCDFD is functional as low as 0.28V.

#### D. Comparison with Prior-arts with Effective Clock Load Calculation and Reliability vs Power Tradeoff

The effective number of clock load transistors ( $N_{clk}$ ) is directly related to clock power consumption. It can be calculated as the sum of the effective external load ( $N_{ext} \times \alpha_{ext}$ , where  $\alpha_{ext}=1$ ) and the internal load ( $N_{int} \times \alpha_{int}$ , where  $\alpha_{int}$  is 0.5 for S2CFF, 1 for TGFF, 0 for ACFF/SCDFD due to eliminated redundant internal toggling), as shown in Fig.6. Extra overhead (KS) is added for FFs with contention or non-static operation where sizing is required to enhance reliability. For example, 3 $\times$  size-up (S=3) is used for ACFF to ensure its functionality at 0.6V. As previously mentioned, this only works in super-threshold



Fig.6: Effective clock load modeling and reliability vs power tradeoff

voltage regime and will not work in NTV region as can be seen in measurement results.

The comparison table in Fig.7 shows the calculated  $N_{\text{clk}}$ . Although the proposed SCDFF has 6 effective clock transistors while other redundant toggling-free FFs has less load seen by clock driver, other FFs require transistor size up for reliability, resulting in additional effective load. Therefore, proposed SCDFF has the least effective clock transistors, and hence the least power consumption. It also shows that, thanks to the symmetric differential FF structure, no area penalty and significantly smaller setup-hold window compared to TGFF is achieved with the SCDFF.

#### IV. CONCLUSIONS

A differential flip-flop with static and contention-free operation is presented without redundant internal clock toggling is presented. With the variation tolerant design, the proposed SCDFF achieves significant power reduction and aggressive voltage scalability, making it ideal solution for low voltage and low power applications such as IoT systems and miniature sensor systems.

#### ACKNOWLEDGEMENT

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2019R1A2C4070438). The EDA tool was supported by the IC Design Education Center(IDE), Korea.

#### REFERENCES

- [1] M. Alioto, "Ultra-Low Power VLSI Circuit Design Demystified and Explained: A Tutorial," *IEEE Trans. on Circuits and Systems – part I*, vol. 59, no. 1, pp. 3–29, Jan. 2012.
- [2] A. Wang, B. Calhoun, and A. Chandrakasan, "Sub-Threshold Design for Ultra Low-Power Systems" Springer, 2006.
- [3] H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar, "Near-threshold voltage (NTV) design – Opportunities and challenges," *Design Automation Conference*, pp. 1149-1154, 2012.
- [4] V. De, S. Vangal, and R. Krishnamurthy, "Near Threshold Voltage (NTV) Computing: Computing in the Dark Silicon Era," in *IEEE Design & Test*, vol. 34, no. 2, pp. 24-30, April 2017..
- [5] C.-K. Teh, et al., "A 77% Energy-Saving 22-Transistor Single-Phase Clocking D-Flip-Flop with Adaptive-Coupling Configuration in 40nm CMOS," *International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 338-339, 2011.
- [6] Y. Kim, et al., "A static contention-free single-phase-clocked 24T flip-flop in 45nm for low-power applications," *International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 466-467, 2014.
- [7] Yumpeng Cai, et al., "Ultra-low power 18-transistor fully static contention-free single-phase clocked flip-flop in 65-nm CMOS", *IEEE Journal of Solid-State Circuits*, vol. 54, no. 2, pp. 550–559, Feb. 2019.
- [8] N. Kawai, et al., "A fully static topologically-compressed 21-transistor flip-flop with 75% power saving," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 11, pp. 2526–2533, Nov. 2014.
- [9] V. Loi Le, et al., "A 0.4-V, 0.138-fJ/cycle single-phase-clocking redundant-transition-free 24T flip-flop with change-sensing scheme in 40-nm CMOS", *IEEE Journal of Solid-State Circuits*, vol. 53, no. 10, pp. 2806-2817, Oct. 2018.

|                                                                 | SCDFF        | TGFF         | ACFF [5]       | S2CFF [6]     | TCFF [8]       | CSFF [9]       | TSPC18 [7]     |
|-----------------------------------------------------------------|--------------|--------------|----------------|---------------|----------------|----------------|----------------|
|                                                                 | This Work    | Std.Cell     | ISSCC 2011     | ISSCC 2014    | JSSC 2014      | JSSC 2018      | JSSC 2019      |
| <b>Static</b>                                                   | Yes          | Yes          | Yes            | Yes           | Yes            | No             | Yes            |
| <b>Contention-Free</b>                                          | Yes          | Yes          | No             | Yes           | No             | Yes            | No             |
| <b>Redundant Internal Clock</b>                                 | No           | Yes          | No             | Yes           | Yes            | No             | Yes            |
| <b>Transistor Count</b>                                         | 26           | 24           | 22             | 24            | 21             | 24             | 18             |
| <b>Effective Clock TR # (<math>N_{\text{clk}}^{(1)}</math>)</b> | $6 + 0 + 0$  | $2 + 10 + 0$ | $4+0+k_1(S-1)$ | $5 + 2.5 + 0$ | $3+0+k_2(S-1)$ | $5+0+k_3(S-1)$ | $4+2+k_4(S-1)$ |
| <b>Power @ 10%, 1.0V, 1GHz</b>                                  | 2.05 $\mu$ W | 4.64 $\mu$ W | 2.42 $\mu$ W   | 3.38 $\mu$ W  |                |                |                |
| <b>Measured Min. VDD<sup>(2)</sup></b>                          | 0.28 V       | 0.30 V       | 0.46 V         | 0.30 V        |                |                |                |
| <b>Measured C-Q Delay @ 1.0V</b>                                | 121.4 ps     | 116.4 ps     | 79.4 ps        | 138.5 ps      |                |                |                |
| <b>Measured Setup Time @ 1.0V</b>                               | 69.9 ps      | 21.8 ps      | 86.4 ps        | 47.8 ps       |                |                |                |
| <b>Measured Hold Time @ 1.0V</b>                                | -53.5 ps     | 15.2 ps      | -62.8 ps       | 28.2 ps       |                |                |                |
| <b>Setup-Hold Window@ 1.0V</b>                                  | 16.4 ps      | 37.0 ps      | 23.6 ps        | 76.0 ps       |                |                |                |
| <b>Normalized Layout Size</b>                                   | 1.00         | 1.00         | 1.11           | 1.00          |                |                |                |

Table I. Summary of measurement results and comparison with prior-arts

1) A+B+C: A – load seen by clock driver  
B – load due to redundant internal clock  
C – additional load due to size up  
2) For 100 dies total, 20 dies in each corner (NN, FF, SS, FS, SF)