

# Full-Custom CMOS 3-Bit Synchronous Counter Transistor-Level Design and Post-Layout Characterization

Roberto Ibáñez Mingarro

Electronics & Embedded Systems Engineering (Double Degree: INSA Toulouse & Universitat Jaume I)

**Abstract**— This work presents a full-custom 3-bit synchronous CMOS counter implemented at transistor level and operating at  $V_{DD} = 0.65$  V. The design was fully laid out and validated post-layout, with measured delays ranging from 90 ps to 172 ps and stable operation under high-frequency excitation. *Intrinsic gate delay does not define system  $f_{max}$ ; sequential timing constraints and clock integrity ultimately bound performance.*



**Figure 1:** Expected synchronous counting sequence (conceptual timing diagram).

## 1 Design Goals and Context

Objective: implement and validate a **fully synchronous 3-bit binary up-counter at transistor level**, including layout and post-layout verification.

### Target specifications

- 3-bit synchronous up-counter outputs ( $Q_2, Q_1, Q_0$ )
- Edge-triggered clocking (master–slave storage)
- Supply voltage:  $V_{DD} = 0.65$  V (low-voltage regime)
- Post-layout transient validation and corner awareness (min/typ/max)

### Engineering intent

- Apply full-custom CMOS design principles
- Link parasitics to waveform behavior
- Distinguish gate delay from sequential timing limits

## 2 Logic Architecture

The synchronous next-state equations are:

$$\begin{aligned} Q_0^+ &= \overline{Q_0} \\ Q_1^+ &= Q_1 \oplus Q_0 \\ Q_2^+ &= Q_2 \oplus (Q_1 Q_0) \end{aligned}$$

Each  $Q_i^+$  drives the D input of an edge-triggered master–slave D flip-flop. This guarantees that state updates occur only around the clock edge, preventing levelsensitive transparency.

The expected binary evolution is illustrated in Figure 1.



**Figure 2:** CMOS inverter layout used as sizing and parasitic reference.

## 3 Transistor-Level Building Blocks

### 3.1 CMOS Inverter (Reference Cell)

The inverter is the *electrical reference* used to reason about sizing, noise margins, and rise/fall symmetry.

The implemented layout is shown in Figure 2.

### What “good” sizing tries to achieve

- PMOS width  $>$  NMOS width to compensate mobility ( $\mu_p < \mu_n$ )
- Comparable pull-up/pull-down strength for symmetric  $t_{pLH}$  and  $t_{pHL}$
- Low output resistance while keeping input capacitance under control

**Why small deviations matter at 0.65 V** At low  $V_{DD}$ , overdrive ( $V_{GS} - V_T$ ) is reduced, so current drops sharply and timing becomes more sensitive to: (1) width ratio errors, (2) stacked devices (effective resistance), (3) parasitic capacitances (especially  $C_{gd}$ ), and (4) interconnect RC.

### 3.2 NAND Gate (Static CMOS)

NAND gates implement the combinational portions required by XOR/AND decompositions.

The full-custom layout of this gate is shown in Figure 3.

### Stack effect and sizing

- Two series NMOS devices increase effective  $R_{on}$  and degrade  $t_{pHL}$
- Common mitigation: up-size stacked NMOS or limit stack depth



**Figure 3:** CMOS NAND layout (routing and diffusion sharing visible).



**Figure 4:** Transistor-level master–slave D flip-flop layout.

- PMOS network is in parallel for NAND, typically less critical for pull-up

This gate is a primary source of *charge sharing* and *dynamic internal node behavior* when multiple transistors switch near-simultaneously, especially under fast clocks.

### 3.3 Master–Slave D Flip-Flop (Edge-Triggered Storage)

A master–slave topology was selected to enforce edge-triggered behavior and isolate combinational logic from the stored state during most of the clock cycle.

The implemented flip-flop layout is shown in Figure 4.

#### Why this matters physically

- Avoids level-sensitive transparency that can amplify hazards in the logic cone
- Provides a well-defined timing interface:  $t_{CQ}$ ,  $t_{setup}$ ,  $t_{hold}$
- However, introduces internal dynamic nodes whose parasitics impact timing and waveform integrity

## 4 Full Counter Layout Integration

The full-custom integration (logic + 3 DFFs + clock distribution) is shown in Figure 5.

#### Layout-level priorities

- Clock routing as a first-class signal (skew and edge quality matter)
- Interconnect length minimization on critical nodes (D inputs, internal FF nodes)
- Diffusion sharing where safe to reduce area/capacitance, but avoiding unintended coupling
- Local symmetry where feasible to reduce systematic mismatch



**Figure 5:** Complete 3-bit synchronous counter full-custom layout.



**Figure 6:** Post-layout transient waveforms showing correct synchronous counting behavior.

## 5 Post-Layout Validation (Transient)

#### Simulation conditions

- Tool: Microwind (post-layout extracted behavior)
- $V_{DD} = 0.65 \text{ V}$ ,  $T = 25^\circ\text{C}$
- High-frequency excitation used for stress validation

Representative post-layout waveforms are shown in Figure 6.

**Functional outcome** The counter follows the expected binary sequence:

$$000 \rightarrow 001 \rightarrow 010 \rightarrow \dots \rightarrow 111$$

No sustained oscillation or metastable lock was observed in the captured window.

## 6 Signal Integrity: Why Overshoot/Uundershoot Appears

Measured waveforms exhibit:

**Table 1:** Post-layout propagation delays across process corners.

| Metric         | Min | Typ | Max |
|----------------|-----|-----|-----|
| $t_{pLH}$ (ps) | 154 | 100 | 90  |
| $t_{pHL}$ (ps) | 172 | 110 | 98  |

- Residual voltages around 100 mV to 140 mV on some nodes
- Minor undershoot and small spikes around transitions

#### Physical causes (post-layout reality)

- **Miller coupling ( $C_{gd}$ ):** fast input transitions inject charge into the output
- **Interconnect capacitance and coupling:** neighboring wires and diffusion add dynamic crosstalk
- **Charge sharing in stacked networks:** internal nodes exchange charge during switching
- **Finite clock slope and clock feedthrough:** FF internal nodes are sensitive to clock edge quality
- **Low  $V_{DD}$  noise margins:** the same absolute disturbance represents a larger fraction of logic swing

**Engineering interpretation** These artifacts do *not* imply logic failure; correct operation requires valid noise margins and node settling before the clock edge.

## 7 Timing: Consistent Interpretation

### 7.1 Propagation Delay Measurements (Gate-Level)

Propagation delays were measured from input 50% crossing to output 50% crossing under post-layout conditions. The measured propagation delays are summarized in Table 1.

**What these numbers actually mean** Delays in the 90 ps to 172 ps range indicate *very fast intrinsic switching* of individual stages under the measured loading conditions.

### 7.2 Why Gate Delay $\neq$ System $f_{\max}$

A synchronous system must satisfy, for every reg-to-reg path:

$$T_{clk} \geq t_{CQ} + t_{logic} + t_{setup} + t_{skew} + t_{margin}$$

Therefore, even if individual gates are sub-ns, practical frequency can be limited by:

- **$t_{CQ}$  of master-slave FF:** internal dynamic nodes and clock feedthrough add delay
- **Setup/hold:** low- $V_{DD}$  reduces speed and margins
- **Clock distribution:** skew and edge degradation affect sampling instant
- **Worst-case internal switching:** simultaneous switching noise, charge sharing, and coupling

**Table 2:** Total power consumption across process corners.

| Condition      | Min  | Typ   | Max   |
|----------------|------|-------|-------|
| $P_{tot}$ (μW) | 3.66 | 27.62 | 381.4 |

**Sanity check (unit consistency)** If a design’s key delays are on the order of 100 ps, the corresponding time scale is 0.1 ns, i.e., *multi-GHz intrinsic potential*. Any reported  $f_{\max}$  in the tens of MHz range must be justified by full sequential timing measurements (including  $t_{CQ}$ ,  $t_{setup}$ , and clock integrity), not by gate  $t_p$  alone.

## 8 Power Characterization

Dynamic power follows the CMOS relationship:

$$P_{dyn} \approx \alpha C V_{DD}^2 f$$

Measured total power across corners is summarized in Table 2.

#### Interpretation

- Large corner spread is expected: mobility,  $V_T$ , and effective drive strengths shift both delay and short-circuit components
- At low  $V_{DD}$ , leakage and short-circuit contributions can become more sensitive to process
- The measured trend remains consistent with the quadratic  $V_{DD}^2$  dependence of dynamic power

## 9 Engineering Takeaways (Analog Devices-Oriented)

This project demonstrates full-custom CMOS competence:

- **Transistor-level design** with static logic and edge-triggered storage
- **Layout-aware analysis** of parasitics and routing impact
- **Signal integrity insight** (coupling, Miller effect, charge sharing)
- **Sequential timing discipline** beyond isolated gate delay
- **Power-performance awareness** across process corners

A 3-bit synchronous counter was implemented in full-custom CMOS at  $V_{DD} = 0.65$  V and validated post-layout. Propagation delays range from 90 ps to 172 ps, with parasitic-driven waveform artifacts remaining compatible with correct operation. *System performance is ultimately bounded by sequential timing closure and clock integrity, not by an isolated gate delay.*