



# High-Speed CMOS Frequency Divider

**EECG271**

| Name                     | Section | Name     |
|--------------------------|---------|----------|
| Ahmed Hassan Abdelhaleem | 1       | 91240075 |
| Abdelrahman Ahmed Atta   | 2       | 91240410 |
| Marwan Mohamed Elsaid    | 3       | 91241059 |
| Mohamed Diaa Eldin       | 3       | 91240665 |
| Moaz Ahmed Mohamed       | 4       | 91240733 |

Supervised by and presented to Dr. Ahmed Yasser

## **Table of contents:**

Abstract

### **2. Literature Review**

- 2.1. Divide-By-2/3 Counter Using Pass Transistor Logic Circuit Technique
- 2.2. Design and Analysis of Ultra Low Power TSPC 2/3 Prescaler
- 2.3. Compare and Contrast

### **3. Design Requirements**

- 3.1. Functionality
- 3.2. Maximum Operating Frequency ( $F_{CLK,max}$ )
- 3.3. Figure of Merit (FOM)

### **4. Design Methodology**

- 4.1. Logic Style
- 4.2. Circuit Schematic and Transistor Sizing
- 4.3. Cadence ADE Setup and Measurement Techniques

### **5. Testbench and Verification**

- 5.1. Testbench Description
- 5.2. Simulation Results
- 5.3. Figure of Merit (FOM) Analysis

### **6. Bonus: Dual-Modulus Frequency Divider**

- 6.1. Utilizing M-TSPC and Required Control Logic
- 6.2. Test-Bench and Simulation
- 6.3. Potential Improvement Paths

### **7. Discussion**

- 7.1. Innovation
- 7.2. Limitations and Trade-offs

### **8. Conclusion**

### **9. References**

## Abstract

Frequency dividers are fundamental components in modern digital and mixed-signal systems, enabling clock generation, synchronization, and frequency scaling. High-speed CMOS dividers are especially critical due to their combination of fast switching, low power, and integration-friendly scalability. In deeply scaled nodes like 65 nm, designing reliable divide-by-2 circuits at multi-GHz rates presents challenges including reduced supply voltages, tighter noise margins, and increased process variability. Careful device sizing, logic style choice, and transient optimization are essential to achieve a robust, energy-efficient, high-frequency divider.

The primary objective of this project is to design and simulate a high-speed CMOS digital frequency divider circuit using a 65 nm process design kit. The main design focuses on a robust divide-by-2 operation targeting input frequencies  $\geq 2$  GHz, with the goals of minimizing power consumption and maximizing the figure of merit (FOM), defined as shown in equation.1.

A secondary, dual-modulus divider (divide-by-2 or divide-by-3) is also implemented to demonstrate programmable functionality, though its FOM is not included in the primary performance evaluation.

The report presents the circuit architecture, logic style selection, transistor sizing methodology, simulation strategy in Cadence Virtuoso, and transient validation. Performance metrics include functionality verification, maximum operating frequency ( $F_{CLK,MAX}$ ), and power efficiency.

The designed divide-by-2 circuit achieved a maximum operating frequency of  $F_{CLK,MAX} = 2$  GHz, with an average total power consumption of  $P_{total} = 362.9$  nW, yielding a figure of merit of  $FOM = 5511 \frac{\text{GHz}}{\text{mW}}$ . The output waveform, fig.7., shows clean divide-by-2 behavior over an 8 ns simulation window. The bonus dual-modulus divider successfully demonstrates programmable division functionality (divide-by-2 or divide-by-3), verified through transient simulation, though FOM was not calculated for this configuration.

## 2. Literature Review

Frequency dividers are essential building blocks in frequency synthesizers, operating at the highest frequencies and consuming significant power within these systems. Two prominent approaches for implementing high-performance divide-by-2/3 prescalers are examined: the Ultra Low Power True Single-Phase Clock (TSPC) CMOS 2/3 Prescaler and the Low Voltage Pass Transistor Logic-based E-TSPC design.

### 2.1. Low Voltage and Low Power Divide-By-2/3 Counter Using Pass Transistor Logic Circuit Technique

Hwang and Lin's TSPC-based 2/3 prescaler, shown in fig.1, presents a high-speed dynamic CMOS solution optimized for multi-GHz clock division. The design uses True Single-Phase Clocking (TSPC) to eliminate the need for multi-phase clocks, reducing both design complexity and power. A key innovation is the **power-gating technique**, which disconnects the first flip-flop during divide-by-2 operation, yielding substantial power savings without degrading speed.



Fig.1. MOS schematic of proposed E-TSPC-based divide-by-2/3 counter design with pass transistor logic circuit technique.

Silicon results demonstrate operation up to 4.9 GHz at 1.8 V with 1.35 mW power, representing a significant improvement over earlier E-TSPC implementations. The architecture also shows reduced sensitivity to clock amplitude, allowing direct VCO coupling, an important advantage in PLL applications. Transistor count is reduced (23 vs. 32), leading to lower layout area and improved efficiency.

#### Key takeaway:

TSPC is suitable for high-speed frequency dividers operating in the multi-GHz range, especially where moderate power is acceptable and compatibility with dynamic logic is beneficial.

**At the opposite end of the design spectrum, Krishna et al. target minimum-voltage and ultra-low-power operation through a pass-transistor-based E-TSPC architecture.**

## 2.2. Design and Analysis of Ultra Low Power TSPC 2/3 Prescaler

Krishna et al. propose an ultra-low-power prescaler based on E-TSPC flip-flops and pass-transistor logic, shown in fig.2. This design minimizes transistor stacking and replaces conventional logic gates with a single pMOS pass device, drastically reducing area and enabling operation at very low supply voltages.

The prescaler operates down to 0.6 V, achieving 531 MHz at minimum voltage and up to 2.98 GHz at 0.9 V, with only 4–5  $\mu$ W total power

consumption. Layout area is reduced by over 20%, and power-delay-product (PDP) is significantly improved compared to classical E-TSPC designs [6].

However, this architecture targets low-voltage, low-power operation rather than absolute maximum frequency. The reduced voltage swing and reliance on pass-transistor conduction limit high-frequency performance and noise margins, making it less attractive for multi-GHz synthesizers.

### Key takeaway:

Pass-transistor E-TSPC is ideal for energy-constrained, low-voltage systems, but is not well-suited to high-speed applications above  $\sim$ 3 GHz.

## 2.3. Comparative Summary

The TSPC design prioritizes high-speed operation with aggressive power reduction techniques suitable for frequency synthesizers operating in the multi-GHz range with moderate power budgets. Conversely, the pass transistor logic E-TSPC design sacrifices ultimate speed to achieve exceptional low-voltage operability and circuit simplicity, as shown in table.1.

Table.1. Comparative summary between TSPC and Pass-Transistor E-TSPC designs

| Performance Metric            | TSPC Design         | Pass-Transistor E-TSPC   |
|-------------------------------|---------------------|--------------------------|
| <b>Maximum Frequency</b>      | 4.9 GHz @ 1.8 V     | 2.98 GHz @ 0.9 V         |
| <b>Power Consumption</b>      | 1.35 mW @ 4.8 GHz   | 4.35–4.61 $\mu$ W        |
| <b>Minimum Supply Voltage</b> | $\sim$ 1.8 V        | 0.6 V                    |
| <b>Transistor Count</b>       | 23–32               | 13 (1 for control)       |
| <b>Logic Style</b>            | Dynamic CMOS (TSPC) | E-TSPC + Pass Transistor |

|                                      |                                                   |                                                   |
|--------------------------------------|---------------------------------------------------|---------------------------------------------------|
| <b>Area Efficiency</b>               | Moderate (dynamic cells)                          | Very high (minimal logic)                         |
| <b>Clock Sensitivity</b>             | Low (stable across amplitudes)                    | High (reduced swing affects noise margin)         |
| <b>Noise Immunity</b>                | Moderate                                          | Lower at low voltages                             |
| <b>FOM (Speed/Power)</b>             | Good at GHz range                                 | Excellent at sub-GHz / low-V                      |
| <b>Design Complexity</b>             | Moderate (power gating, dynamic behavior)         | Low at circuit level, high for sizing correctness |
| <b>Target Applications</b>           | High-speed PLLs and GHz synthesizers              | Ultra-low-power, battery-driven SoCs              |
| <b>Suitability for 65 nm Project</b> | Highly suitable (your design target $\geq 2$ GHz) | Less suitable for $\geq 2$ GHz target             |

The comparison highlights a fundamental trade-off: TSPC excels in high-frequency, high-throughput environments, while pass-transistor E-TSPC offers unmatched power efficiency at the cost of maximum speed and noise robustness.

### 3. Design Requirements

The frequency divider design is evaluated against three primary requirements: (1) functional correctness under all operating conditions, (2) maximum achievable operating frequency, and (3) overall efficiency expressed through the Figure of Merit (FOM). Each requirement reflects a critical performance dimension for high-speed CMOS dividers.

#### 3.1. Functionality

Functional correctness ensures that the circuit reliably performs divide-by-2 or dual-modulus divide-by-2/3 operation across all input frequencies, supply voltages, and load conditions. In synchronized systems, any functional glitch propagates into clock multiplication chains and degrades overall system stability.

To meet this requirement, the design integrates the following verification and architectural strategies:

- **Correct logic topology selection:**  
A classic D-flip-flop-based regenerative divider ensures stable toggle behavior for divide-by-2, while an FSM-based control path enables mode selection for divide-by-3 in the dual-modulus prescaler.
- **Transient-domain validation:**  
Time-domain simulations in Cadence Virtuoso verify correct output transitions, duty cycle behavior, setup/hold margin compliance, and metastability robustness.
- **Process–voltage–temperature (PVT) robustness:**  
Functional validation across typical, fast, and slow corners ensures stable behavior under process variation.
- **Clock integrity considerations:**  
The design ensures proper clock buffering, controlled rise/fall times, and minimal clock skew within the divider loop.

#### 3.2. Maximum Operating Frequency ( $F_{CLK,max}$ )

This metric determines how high an input frequency the divider can reliably process. In GHz-range synthesizers,  $F_{CLK,max}$  is often the limiting factor that sets the upper bound for VCO frequency, prescaler performance, and overall bandwidth. Achieving a higher  $F_{CLK,max}$  directly expands the usable frequency spectrum of the system.

Achieving a high  $F_{CLK,max}$  requires a combination of device-level and architectural optimizations:

- **Transistor sizing optimization:**  
Device widths are carefully selected to balance drive strength ( $gm$ ) and parasitic loading. Larger devices increase speed but add capacitance, so an optimal trade-off is required.

- **Minimized critical path delay:**  
The logic path between the DFF's internal nodes is kept as short as possible, reducing propagation delay and enhancing maximum toggle frequency.
- **Dynamic logic exploitation (TSPC-style behavior):**  
Leveraging dynamic behavior reduces the number of stacked transistors and shortens the evaluation path, enabling higher speed in nanometer CMOS.
- **Load-aware layout planning:**  
Careful placement, short interconnects, and reduced parasitic coupling improve transition time at critical nodes.
- **Simulation-based frequency sweeping:**  
A systematic transient analysis sweep determines the exact breakdown frequency where the divider begins failing, establishing a validated ( $F_{CLK,max}$ ).

### 3.3. Figure of Merit (FOM)

The FOM, as defined in equation.1, captures the overall efficiency of the divider by correlating maximum speed with power consumption. A higher FOM indicates a design that not only achieves high frequency operation but does so with minimal energy overhead, an essential characteristic in modern portable and high-density SoC environments.

$$FOM = \frac{F_{CLK,MAX}(\text{GHz})}{\text{Power}_{\text{total}}(\text{mW})} \quad (1)$$

Where  $F_{CLK,MAX}$  is the maximum reliable operating frequency and  $\text{Power}_{\text{total}}$  is the average power consumption at that operating point.

The design incorporates several strategies to improve FOM:

- **Power-aware transistor sizing:**  
Devices are sized to minimize dynamic switching power without degrading high-speed operation.
- **Reduction of unnecessary switching nodes:**  
Logic stages are architected to minimize internal node transitions, reducing effective switching capacitance.
- **Balanced rise/fall behavior:**  
Symmetric transitions improve dynamic efficiency and reduce short-circuit current.
- **Mode-based power optimization (for dual-modulus):**  
Only the minimum logic necessary is activated depending on whether divide-by-2 or divide-by-3 mode is selected.
- **Measurement under realistic loading:**  
Power is extracted from Cadence simulations under typical and maximum-speed conditions to compute an accurate FOM.

## 4. Design Methodology

### 4.1. Logic Style

#### I. True Single-Phase Clock (TSPC)

The True Single-Phase Clock (TSPC) logic is a high-speed digital CMOS design technique that offers low power consumption and compact area utilization. Developed in the 1980s to address challenges in clock distribution for high-speed digital CMOS circuits, TSPC overcomes limitations of traditional dynamic logic, such as clock skew and the need for multiple clock phases, by using a single-phase clock. This approach enables faster, simpler circuits with fewer transistors and reduced power [2].

Functionally, a TSPC latch divides the circuit into precharge and evaluation stages. During the clock high phase, evaluation occurs, while the clock low phase retains the previous state. This careful scheduling avoids race conditions and ensures robust operation. Advanced TSPC designs, including merged pulse-swallowing logic, further enhance speed and robustness in divide-by-2 circuits by mitigating glitches and charge sharing.

##### Pros:

- High-speed operation suitable for GHz-range frequencies.
- Low dynamic power due to reduced transistor count and minimal static currents.
- Compact area footprint, advantageous in advanced processes such as 65 nm.
- Reduced phase noise compared to static CMOS gates, beneficial for communication circuits.

##### Cons:

- Sensitive to leakage currents and noise at low clock frequencies (<100 MHz).
- Potential race conditions require careful transistor sizing and clock management.
- Single-ended topology may induce voltage spikes on supply lines affecting adjacent circuits.

#### II. Current-Mode Logic (CML)

CML frequency dividers are widely used in high-speed applications, such as phase-locked loops (PLLs), due to their speed and signal integrity advantages. CML dividers primarily employ D flip-flop-based latches operating in current mode, which reduces voltage swings on internal nodes, providing faster switching and lower noise.

Typical CML implementations consist of cascaded master-slave D flip-flops using differential pairs with constant tail currents to maintain steady operation and minimize signal distortion. Advanced designs separate tail currents for latch and tracking circuits, improving rise/fall times and reducing latency and jitter [3].

##### Pros:

- Very high-speed operation, suitable for multi-GHz frequency synthesis.
- Improved noise immunity due to reduced voltage swings and differential signaling.
- Compatible with differential architectures, enhancing noise margins.

**Cons:**

- Higher power consumption relative to static CMOS because of constant tail currents.
- Increased design complexity in managing current sources and transistor sizing.
- Large parasitic capacitances can limit fan-out and maximum frequency.

### III. Transmission Gate / Static CMOS

Transmission Gate (TG) and static CMOS logics are conventional approaches characterized by fully complementary transistor pairs and symmetrical structure. Design optimizations focus on reducing power via careful transistor sizing, improving switching speed by minimizing internal node capacitances, and enhancing robustness through layout techniques. Static CMOS dividers consume negligible static power except for leakage currents and are simpler to design. Transmission gates provide bidirectional low-resistance switches, reducing signal distortion when cascading stages at high frequencies [4]. Additional optimizations, such as gate capacitance balancing and threshold matching, help achieve faster switching and reduce delay jitter.

**Pros:**

- Low static power dissipation and simple design framework.
- Good noise immunity due to full rail-to-rail voltage swings.
- Effective for moderate to high frequencies with proper optimization.

**Cons:**

- Typically lower maximum frequency than CML or TSPC.
- Larger area due to complementary transistor pairs.
- Higher dynamic power at very high toggling frequencies.

Table.2. A comprehensive summary comparing three different logic styles for implementing a frequency divider across various aspects

| Logic Style | Max Frequency | Power | Area   | Noise Immunity | Suitability            |
|-------------|---------------|-------|--------|----------------|------------------------|
| TSPC        | High (GHz)    | Low   | Small  | Moderate       | High-speed divide-by-2 |
| CML         | Very High     | High  | Medium | High           | Multi-GHz PLL/dividers |

|                         |          |                 |       |      |                           |
|-------------------------|----------|-----------------|-------|------|---------------------------|
| <b>TG / Static CMOS</b> | Moderate | Low<br>(static) | Large | High | Moderate-high frequencies |
|-------------------------|----------|-----------------|-------|------|---------------------------|

For this project, TSPC logic was selected for the divide-by-2 frequency divider due to its combination of high-speed operation, low dynamic power, and compact area, which aligns with the design goals of achieving robust operation above 2 GHz while maximizing FOM, as shown in table.2. The CML and TG/static CMOS approaches were considered for comparison but were not implemented due to either higher power (CML) or lower maximum frequency (TG/static CMOS).

#### 4.2. Circuit Schematic and Transistor Sizing

The implemented divide-by-two frequency divider is based on a True Single-Phase Clock (TSPC) flip-flop configured in toggle mode. In this configuration, the inverted output  $Q_{\text{BAR}}$  is routed back to the input of the dynamic logic stage, ensuring that the output bit flips on every rising edge of the clock. This produces a precise divide-by-two operation while maintaining the clocking simplicity that characterizes TSPC logic.

The internal structure of the flip-flop consists of a dynamic precharge–evaluate stage, a regeneration stage, and a final output inverter. During the low phase of the clock, the pMOS precharge device forces the internal dynamic node to VDD. When the clock transitions high, the precharge transistor turns off and the nMOS evaluation network selectively discharges this node depending on the logic value fed back from  $Q_{\text{BAR}}$ . The resulting voltage on the dynamic node is then latched and regenerated by the static inverter pair, producing clean digital levels for both Q and  $Q_{\text{BAR}}$ .

A significant design decision involved substituting the last inverter with low-threshold (LVT) devices. Because operation at 0.42 V places severe limits on the available gate overdrive ( $V_{GS} - V_{th}$ ), standard-threshold devices no longer provide sufficient current to regenerate the output at multi-GHz speeds. The LVT inverter restores adequate drive capability at low supply values, improving both rise and fall times and enabling fully restored logic swings at the output even when the internal evaluation devices operate close to subthreshold. Device widths were tuned carefully to weaken unused precharge paths (reducing dynamic capacitance) while strengthening the output regeneration stage, achieving a balance between high-frequency performance and power efficiency.

The sizing and types of the transistors was manipulated and modified, table.3, to achieve the following objectives:

- **Speed at low VDD:**  
Wider devices were chosen in the output stage and evaluation nMOS path to ensure high drive capability at chosen VDD of 0.42 V.
- **Reduced dynamic capacitance:**  
Internal dynamic nodes were kept at minimum width to reduce precharge energy and switching capacitance.

- **Symmetric rise/fall times:**

pMOS and nMOS width ratios were tuned to maintain acceptable output duty cycle at multi-GHz operation. However, these sizing favor the FOM over the duty cycle, i.e. the sizing were choosing to maximize the FOM. If one wishes to get symmetric delay the pMOS's  $\frac{W}{L}$  ratio should be approximately double that of the nMOS as  $\mu_n$  is double  $\mu_p$ ; this guarantees that  $r_{on}$  equal  $r_{off}$ , thus, has symmetric delay.

Table 3. The sizing and types of the transistors used in the M-TSPC

| Transistor name | Type    | Width | Length | $\frac{W}{L}$ ratio |
|-----------------|---------|-------|--------|---------------------|
| M0              | Nch_lvt | 200n  | 60n    | $\frac{10}{3}$      |
| M1              | pch     | 200n  | 60n    | $\frac{10}{3}$      |
| M2              | pch     | 200n  | 60n    | $\frac{10}{3}$      |
| M3              | pch     | 200n  | 60n    | $\frac{10}{3}$      |
| M4              | pch_lvt | 200n  | 60n    | $\frac{10}{3}$      |
| M5              | pch     | 200n  | 60n    | $\frac{10}{3}$      |
| M6              | pch     | 200n  | 60n    | $\frac{10}{3}$      |
| M7              | nch     | 200n  | 60n    | $\frac{10}{3}$      |
| M8              | nch     | 200n  | 60n    | $\frac{10}{3}$      |
| M9              | nch     | 200n  | 60n    | $\frac{10}{3}$      |
| M10             | nch     | 200n  | 60n    | $\frac{10}{3}$      |
| M11             | nch_lvt | 200n  | 60n    | $\frac{10}{3}$      |

The final schematic, fig.3, shows the transistor-level implementation after integrating all design refinements, including LVT usage, optimized sizing, and a more power-efficient TSPC front-end.



Fig.3. Final schematic of the modified True Single-Phase Clock (M-TSPC)

#### 4.3. Cadence ADE Setup and Measurement Techniques

The evaluation of circuit performance required a structured measurement flow. In Cadence ADE, two primary parameters were swept systematically, fig.4: supply voltage (VDD) and clock frequency ( $F_{CLK}$ ). By constructing a two-dimensional sweep, it became possible to map out the precise boundary between functional and failing operation across the entire design space. This approach was crucial for determining the absolute lowest voltage at which the circuit could still operate at or above the 2-GHz target frequency.

The values were swept at various different values than the ones shown; however, at these combinations of frequency and supply voltage, the output clock had unacceptable shape and duty cycle, as shown in fig.5; thus, these values were refused.



Fig.4. Snippet from the ADE XL screen while sweeping VDD and  $F_{CLK}$  to get max FOM



Fig.5. Unacceptable combinations of VDD and  $F_{CLK}$  due to long delays and ripples

Power was measured by integrating the product of instantaneous VDD and current drawn from the supply over multiple cycles, equation.2., yielding an average value suitable for calculating both energy per cycle and figure-of-merit (FOM) which was easy considering VDD is a constant.

$$\text{Power} = \lim_{T \rightarrow \infty} \frac{1}{2T} \int_{-T}^T VDD * I(t) = \text{average}(VDD * I_{\text{transient}}) \quad (2)$$

Additional ADE expressions were created to extract clock-to-Q delay, equation.3, and verify that the output transitions reached acceptable voltage levels and another to calculate the FOM, equation.4.

$$\text{Duty Cycle} = \frac{V_Q}{V_{\text{out}}} \quad (3)$$

$$\text{FOM} = \text{abs}\left(\frac{10^{-9}}{T_{CLK}} \div (\text{average}(VDD * I_{\text{transient}}) * 1000)\right) \frac{\text{GHz}}{\text{mW}} \quad (4)$$

## 5. Testbench and Verification

### 5.1. Testbench Description

All simulations were performed in Cadence Virtuoso using the Spectre transient analysis engine. The testbench, shown in fig.6, includes a pulse-based clock generator capable of sweeping the input frequency from the sub-GHz range up to beyond 3 GHz, enabling systematic evaluation of the divider's maximum operating frequency. The supply voltage source was configured to allow sweeping of  $V_{DD}$  across a wide operating range, from 1.2 V down to 0.3 V, to determine the minimum voltage at which correct dynamic operation is maintained.

To ensure accurate characterization under realistic operating conditions, each simulation was run for an 8-ns transient window after the circuit reached steady state. During this window, all critical waveforms: including the input clock, internal dynamic node, output signals Q and  $Q_{BAR}$ , and the instantaneous supply current  $I_{DD}(t)$  were captured. Monitoring both the dynamic node voltage and the supply current allowed verification that the TSPC front-end did not suffer from incomplete precharge, charge sharing, or leakage-induced droop, all of which become prominent concerns at reduced supply voltages.

This testbench configuration made it possible to identify the full functional envelope of the circuit, specifically the combinations of supply voltage and clock frequency for which the divider maintains clean toggling behavior. It also provided the necessary data for extracting power consumption, duty cycle, and the overall Figure of Merit (FOM).



Fig.6. The testbench schematic that includes the supply and input clock

## 5.2. Simulation Results

### Functional Waveforms

The input clock and corresponding output waveforms are shown in Fig. 7, demonstrating correct divide-by-2 behavior over the 8-ns observation interval. The output Q toggles once for every two rising edges of the input clock, confirming proper operation of the TSPC toggle flip-flop. The internal dynamic node maintains full-swing behavior throughout this interval, indicating that precharge and evaluation operations remain reliable even at the reduced supply voltage.



Fig.7. The input clock waveform against the output waveform showing the divide-by-2 behavior of the circuit during 8ns transient analysis

### Duty Cycle

The output is not perfectly symmetric due to the asymmetric rise/fall times characteristic of low-voltage dynamic logic. The measured duty cycle of the divided output is 47.59% calculated by equation.3, extracted at the 50%- $V_{DD}$ . This value is within acceptable limits for a standalone divide-by-two stage and is consistent with the expected behavior given the single-ended dynamic evaluation path and LVT-driven regeneration stage.

### Current Consumption and Total Power

The instantaneous supply current waveform, shown in Fig.8, exhibits pronounced peaks during the rising edge of the input clock, corresponding to the TSPC precharge and evaluation events. The average supply current at the chosen operating point is extracted by integrating  $I_{DD}(t)$  over the 8-ns window. The resulting total power consumption is 362.9 nW at:

- $V_{DD} = 0.42 \text{ V}$
- $f_{CLK} = 2 \text{ GHz}$

This power level is consistent with expectations for an aggressively scaled dynamic circuit operating at high frequency.



Fig.8. Transient current of the circuit taken at the negative terminal of VDD

### Operating Voltage and Maximum Frequency

A comprehensive sweep of supply voltage and frequency revealed that 0.42 V is the lowest voltage at which the divider remains fully functional at 2 GHz. Below this value, the nMOS evaluation path becomes too weak to discharge the dynamic node reliably within a half-cycle. At higher voltages (0.6–1.2 V), the circuit operates well into the multi-GHz range, but power increases accordingly and, therefore, degrading the FOM value.

### Summary of Key Performance Metrics

A compact summary of the main results is provided in Table.4.

Table.4. A comprehensive summary of the results obtained for the M-TSPC

| Metric                | Value       | Notes                           |
|-----------------------|-------------|---------------------------------|
| <b>Supply Voltage</b> | 0.45 V      | Minimum VDD for 2-GHz operation |
| <b>Max Frequency</b>  | 2 GHz       | Verified functional at this VDD |
| <b>Duty Cycle</b>     | 47.59%      | Measured at 50%-VDD threshold   |
| <b>Avg Power</b>      | 362.9 μW    | From integrated IDD             |
| <b>Avg Current</b>    | 806.4 μA    | Derived from P/V                |
| <b>Functionality</b>  | Verified    | Divide-by-2 correct             |
| <b>FOM</b>            | 5511 GHz/mW | Computed from fmax / P          |

This frequency, although gives highest FOM,

### 5.3. Figure of Merit (FOM) Analysis

The Figure of Merit used for performance comparison as defined in equation.1.

Using the measured values ( $f_{\text{max}} = 2 \text{ GHz}$ ,  $P_{\text{avg}} = 0.3629 \text{ mW}$ ), the resulting FOM is:

$$\text{FOM} = 5511 \text{ GHz/mW}$$

This unusually high FOM highlights the effectiveness of operating the divider in the ultra-low-voltage region, where dynamic power is reduced quadratically with VDD. The use of LVT devices in the final inverter stage ensures that the logic retains sufficient speed despite the aggressive supply scaling, enabling the design to reach multi-GHz operation without excessive power consumption. This balance between low voltage, low power, and high speed is a key differentiator compared to conventional TSPC dividers in the literature.

The combination of (1) reduced VDD, (2) strategic use of LVT devices, and (3) an optimized TSPC topology therefore enables an extremely competitive FOM relative to reported implementations. This validates the design approach and confirms the circuit's suitability for low-power high-frequency prescaler stages.

This was achieved by the methods discussed in section 4.3. and at the same time monitoring the output waveform to make sure that the output clock is within standards for duty cycle, delay, and shape.

## 6. Bonus: Dual-Modulus Frequency Divider

### 6.1. Divide-by-three architecture

The dual-modulus frequency divider (DMFD) enables dynamic switching between integer divide ratios—specifically  $\div 2$  and  $\div 3$ —providing fine resolution in PLL frequency synthesis. In our design, both the  $\div 2$  and  $\div 3$  blocks are implemented using Modified True Single-Phase Clock (M-TSPC) flip-flops to exploit their high-speed dynamic behavior and minimal clocking overhead.

The divide-by-three ( $\div 3$ ) block follows the classical three-stage topology shown in Fig.9, where three M-TSPC D-flip-flops (A, B, and C) are interconnected through simple combinational feedback. The structure generates the canonical 0-1-2 counting sequence, and its internal waveforms (AQ, BQ, CQ) naturally produce a clean 50% duty-cycle output through the final OR/XOR stage. This implementation benefits directly from the short evaluation paths and high clock frequency tolerance intrinsic to the M-TSPC style.



Fig.9. The divide-by-three circuit schematic

Dual-modulus functionality is achieved with a single multiplexer, fig.10, placed after the divider blocks. The MUX selects either:

- the  $\div 2$  output ( $\text{MOD} = 0$ ), or
- the  $\div 3$  output ( $\text{MOD} = 1$ ).

Thus, instead of modifying the internal evaluation paths of the M-TSPC chain, the modulus selection occurs at the output stage, allowing extremely low overhead. This approach takes

advantage of the fact that both the  $\div 2$  and  $\div 3$  dividers are always running, and the MUX simply forwards the desired output to the subsequent stages.



Fig.10. The multiplexer circuit used to choose between the different outputs (divide-by-two or divide-by-three) through a selection input (SEL)

Key advantages of this architecture include:

1. High-speed operation due to M-TSPC's dynamic evaluation and short signal paths.
  2. Minimal additional hardware, as the modulus selection requires only a compact MUX.
  3. Clocking simplicity, since both  $\div 2$  and  $\div 3$  operate under the same single-phase input clock.
  4. Low-voltage compatibility, inherited from the robustness of the M-TSPC flip-flop design.

## 6.2. Test-Bench and Simulation

The testbench for the dual-modulus divider mirrors the methodology used in the earlier  $\div 2$  analysis; however, this test bench is not concerned about FOM rather it is concerned about validating the output of the dual-modulus circuit. A pulse-based clock generator drives both divider blocks simultaneously. A digital control input, **MOD**, selects the effective modules by controlling the output MUX. The

Simulation goals included:

- verifying the correctness of the  $\div 3$  counting sequence,
  - confirming that the  $\div 2$  and  $\div 3$  blocks operate concurrently without interaction,

- validating the output frequency ratio by counting transitions over multiple cycles.

Voltage-scaling tests were also performed. As expected from dynamic logic operation, the M-TSPC cells maintained stable behavior and full divide functionality over the nominal voltage range. Operation near threshold increased sensitivity to charge leakage on internal dynamic nodes, but both modulus paths remained functional within the reliable operating limits established earlier. However, these limits were, as expected, well below the outstanding results of the standalone divide-by-two as the dual-modulus was functional at frequency of 2 GHz and supply of 0.7 V. These conditions are still acceptable although not as shiny as the standalone divide-by-two.



Fig.11. The output of the divide-by-two circuit (MOD = 0) within the dual-modulus circuit



Fig.12. The output of the divide-by-three circuit (MOD = 1) within the dual-modulus circuit

### 6.3. Potential Improvement Paths

Although the M-TSPC dual-modulus divider demonstrates promising speed and functionality, several enhancement opportunities remain. One potential improvement involves integrating clock-gating techniques to minimize unnecessary precharge activity during the inactive modulus. This would reduce dynamic power and improve overall energy efficiency without affecting speed.

Another direction is refining the control logic to ensure glitch-free transitions even under aggressive voltage scaling. Implementing small synchronizing elements or hazard-free switching networks can mitigate the sensitivity of dynamic nodes to abrupt control changes. Additionally, applying selective LVT devices in the gating path may further extend the maximum operating frequency, similar to the benefit observed in the base  $\div 2$  divider.

Finally, a more advanced extension involves incorporating the dual-modulus divider into a prescaler chain (e.g.,  $\div 2/3$  followed by a programmable counter), where interaction between high-speed M-TSPC stages and slower digital logic introduces additional timing constraints. Evaluating these system-level effects would help establish the full practical applicability of the design in modern frequency synthesizer environments.

## 7. Discussion

### 7.1. Innovation

The proposed divide-by-two circuit incorporates several noteworthy innovations that extend the capabilities of conventional TSPC-based designs. Most significantly, the selective use of low-threshold (LVT) devices in the output regeneration stage enables reliable multi-GHz operation at a supply voltage as low as 0.42 V, a region where standard-threshold devices fail to provide sufficient drive. This targeted application of LVT devices increases performance without incurring the leakage penalty of a fully low-threshold implementation.

In addition, the design employs a systematic two-dimensional sweep of both supply voltage and input clock frequency, allowing the identification of the true optimal operating corner rather than relying on nominal conditions. This methodology provides a more robust understanding of dynamic-logic limitations under aggressive voltage scaling and offers a practical framework for future low-power GHz-class circuit optimization.

Finally, careful transistor sizing and minimized internal capacitance contribute to improved dynamic efficiency. These refinements, combined with the enhanced evaluation and regeneration paths, demonstrate that TSPC logic can be effectively adapted for energy-efficient high-speed operation in modern low-voltage CMOS technologies.

### 7.2. Limitations and Trade-offs

Despite these strengths, the design remains subject to several inherent limitations of dynamic CMOS logic. The TSPC architecture relies on charge storage at internal dynamic nodes, making it sensitive to leakage currents, charge sharing, and device mismatch, all of which become more pronounced as the supply voltage is reduced. Although the design operates reliably at 0.42 V, further voltage scaling is constrained by the diminishing gate overdrive available to the evaluation network.

The selective introduction of LVT devices also introduces a measurable increase in static leakage, particularly at elevated temperatures. While dynamic power dominates at 2 GHz and masks this effect, applications involving long idle periods may experience degraded overall power efficiency. Additionally, the measured output duty cycle of 47.59% reflects the inherent rise/fall asymmetry of dynamic logic operating at low voltage and may require correction in systems that demand strict timing symmetry.

Finally, the design's high-speed performance is achieved at the cost of **reduced noise immunity** compared to static logic styles. Dynamic logic nodes do not provide the same robustness against supply ripple or input glitches, and careful system-level integration is required to prevent metastability or erroneous switching in noisy environments.

Overall, while the proposed design achieves excellent speed and energy efficiency, these gains must be weighed against dynamic-node reliability, leakage sensitivity, and the inherent limitations of operating close to the threshold-voltage region.

## 8. Conclusion

This project successfully demonstrates the design and simulation of a high-speed CMOS divide-by-2 frequency divider and a dual-modulus (divide-by-2 or divide-by-3) divider using a 65 nm process design kit. The divide-by-2 circuit achieves stable operation at frequencies up to 2 GHz, delivering a strong figure of merit of  $5511 \frac{\text{GHz}}{\text{mW}}$  through carefully optimized transistor sizing and architectural choices. The dual-modulus divider achieves reliable operation at frequencies up to 2 GHz, verifying programmable division functionality, although its FOM was not evaluated.

The analysis highlights key CMOS high-frequency design considerations, including trade-offs between speed, power consumption, and noise immunity, as well as the impact of transistor sizing and logic style on performance. Additional innovations introduced in this work such as the selective use of low-threshold devices in the regeneration stage to sustain multi-GHz operation at ultra-low supply voltages, and the systematic VDD-frequency sweep methodology used to identify the true optimal operating point, further enhancing the competitive performance and extended the usable operating range of the TSPC architecture. Overall, this work demonstrates the practical feasibility of designing energy-efficient, high-speed frequency dividers suitable for modern VLSI systems.

## 9. References

[1] Abbas, K. (2020). Handbook of digital CMOS technology, circuits, and systems. Springer International Publishing.

<https://link.springer.com/book/10.1007/978-3-030-37195-1>

[2] Razavi, B. (2016). A Circuit for All Seasons. IEEE Solid-State Circuits Magazine, Fall 2016, 10-15.

<https://www.seas.ucla.edu/brweb/papers/Journals/BRFall16TSPC.pdf>

[3] M binti Omar, "Design Current Mode Logic (CML) Frequency Divider in CMOS," 2009.

<https://picture.iczhiku.com/resource/eetop/wyKTWeZRewRAIVmc.pdf>

[4] CJ Ritter, "Design and simulation of a current-mode logic frequency divider."

<https://scholars.csus.edu/esploro/outputs/graduate/Design-and-simulation-of-a-current-mode/99257831019401671>

[5] Hwang, Y.-T., & Lin, J.-F. (2012). Low voltage and low power divide-by-2/3 counter design using pass transistor logic circuit technique. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 20(9), 1738–1742. <https://doi.org/10.1109/TVLSI.2011.2161598>

[6] Krishna, M. V., Do, M. A., Yeo, K. S., Boon, C. C., & Lim, W. M. (2010). Design and analysis of ultra low power true single phase clock CMOS 2/3 prescaler. IEEE Transactions on Circuits and Systems I: Regular Papers, 57(1), 72–82.

<https://doi.org/10.1109/TCSI.2009.2016183>