



**Department of Electrical Engineering and Computer Science**

Howard University

**EECE 423: VLSI Design**

FALL 2025

**Final Project: 4-Bit CMOS Ripple-Carry Adder**

By:

Isioma Nwansoh

Mohammed Akinbayo

**Instructor: Dr. Hassan Salmani**

## Introduction

The goal of this project was to design, simulate, and analyze a 4-bit ripple-carry adder (RCA) at the transistor level using HSPICE. The assignment required:

- Building a static CMOS NAND gate as a reusable subcircuit.
- Constructing a 1-bit full adder using only NAND gates.
- Cascading four full adders into a 4-bit RCA using subcircuit instantiation.
- Performing dynamic simulations to measure worst-case delay, power, and Power-Delay Product (PDP) under different transistor sizing strategies.

This report explains the design logic, shows how the simulations were set up, presents the measured results (delay, power, PDP), and discusses the power–performance trade-offs for the three sizing strategies:

1. PMOS-only scaling
2. NMOS-only scaling
3. Proportional scaling of both PMOS and NMOS

## Circuit Design and Hierarchy

[https://github.com/Mohammed532/VLSILab\\_2025/blob/main/FINAL/](https://github.com/Mohammed532/VLSILab_2025/blob/main/FINAL/)

The testbench included voltage sources for VDD and GND, and PWL input sources for A[3:0] and B[3:0] that step through a sequence of input combinations to exercise carry propagation and high switching activity.

## 2-Input CMOS NAND Subcircuit

The basic building block is a 2-input static CMOS NAND gate defined as a `.SUBCKT`:

```
.subckt NAND a b y vdd gnd  
MMN1 y a x    gnd n105_HVT l=0.03u w=0.54u*WN  
MNN2 x b gnd gnd n105_HVT l=0.03u w=0.54u*WN  
MMP1 y a vdd vdd p105_HVT l=0.03u w=0.27u*WP  
MMP2 y b vdd vdd p105_HVT l=0.03u w=0.27u*WP  
.ends
```

The parameters `WP` and `WN` are global scaling factors we change later (.alter statements) to implement the different sizing experiments.

## NAND-Only 1-Bit Full Adder (Indian RCA Style)

The 1-bit full adder is implemented using only NAND gates and instantiated as a subcircuit:

```
.subckt adder a b cin sum cout vdd gnd

X1 a    b    n1    vdd gnd NAND
X2 a    n1   n2    vdd gnd NAND
X3 b    n1   n3    vdd gnd NAND
X4 n2   n3   n4    vdd gnd NAND ; A ⊕ B

X5 n4   cin  n5    vdd gnd NAND
X6 n4   n5   n6    vdd gnd NAND
X7 cin  n5   n7    vdd gnd NAND
X8 n6   n7   sum  vdd gnd NAND ; (A ⊕ B) ⊕ Cin

X9 n1   n5   cout vdd gnd NAND ; carry majority function

.ends adder
```

### Logic explanation:

Gates X1–X4 implement  $A \oplus B$  using a standard NAND-only XOR construction.

Gates X5–X8 take this intermediate XOR and combine it with ‘Cin’ to implement

$$\text{SUM} = A \oplus B \oplus \text{Cin}.$$

Gate X9 combines internal nodes `n1` (related to A and B) and `n5` (related to Cin and A $\oplus$ B) to generate COUT, effectively implementing the majority function of the three inputs.

Because this design is fully built from NAND gates, it maps naturally to the previously defined subcircuit and makes resizing straightforward (just change `WP` and `WN` globally).

## 4-Bit Ripple-Carry Adder

The 4-bit RCA is formed by cascading four full-adder instances:

```
Xrc1 A0 B0 CIN S0 C0 vdd gnd adder N='N' P='P'  
Xrc2 A1 B1 C0 S1 C1 vdd gnd adder N='N' P='P'  
Xrc3 A2 B2 C1 S2 C2 vdd gnd adder N='N' P='P'  
Xrc4 A3 B3 C2 S3 C3 vdd gnd adder N='N' P='P'
```

Carry-out from each stage becomes carry-in to the next. This means the worst-case path is a carry transition that ripples from the least significant bit (S0) all the way to C3 (COUT).

## Testbench and Input Patterns

Inputs A[3:0] and B[3:0] are driven by PWL sources that emulate adding several different operand pairs over time. The sequence covers cases such as:

**0+0->15+0->15+15->0+15->0+1**

Each bit toggles at a different frequency, ensuring high switching activity and forcing carry propagation through different bit positions.



Signals from top to bottom - A0, A1, A2, A3, B1, B2, B3, S0, C0, S1, C1, S2, C2, S3, C3

The initial carry-in 'CIN' is held at 0 V in this project, which is typical for a basic RCA test.

## Sizing Experiments

To study the power–delay trade-offs, we performed three sizing experiments using `alter` statements in the testbench:

### 1. PMOS-only scaling :

```
.alter p_scale_2 .param P = 2 N = 1  
.alter p_scale_3 .param P = 3 N = 1  
.alter p_scale_4 .param P = 4 N = 1
```

### 2. NMOS-only scaling:

```
.alter n_scale_2 .param P = 1 N = 2  
.alter n_scale_3 .param P = 1 N = 3  
.alter n_scale_4 .param P = 1 N = 4
```

### 3. Proportional scaling (both):

```
.alter c_scale_2 .param P = 2 N = 2  
.alter c_scale_3 .param P = 3 N = 3  
.alter c_scale_4 .param P = 4 N = 4
```

'P' and 'N' scale PMOS and NMOS widths in the NAND subcircuit. Every device in the entire 4-bit adder is resized consistently, satisfying the assignment requirement that all PMOS or all NMOS across the design share the same W/L for each case.

## Measurement Setup

### Delay Measurements

Propagation delay is measured using `.measure` statements that trigger when an input crosses 'VDD/2' and target when an output crosses 'VDD/2'. For example, one of the S0 measurements:

```
.measure tran S0_AhBl_RISE_DELAY  
TRIG V(A0) VAL='SUPPLY/2' RISE=1  
TARG V(S0) VAL='SUPPLY/2' RISE=1
```

For COUT we focus on worst-case transitions involving carry rippling

## Proportional Scaling

- ☒ cout\_cfahbl\_fall\_delay=-13.2p
- ☒ cout\_cfalbh\_fall\_delay=37.5p
- ☒ cout\_crahbh\_rise\_delay=-9.39p
- ☒ cout\_crahbl\_rise\_delay=35.3p

- ☒ cout\_cfahbl\_fall\_delay=-12.8p
- ☒ cout\_cfalbh\_fall\_delay=31p
- ☒ cout\_crahbh\_rise\_delay=-7.65p
- ☒ cout\_crahbl\_rise\_delay=32.3p

- ☒ cout\_cfahbl\_fall\_delay=-12.7p
- ☒ cout\_cfalbh\_fall\_delay=27.9p
- ☒ cout\_crahbh\_rise\_delay=-6.86p
- ☒ cout\_crahbl\_rise\_delay=30.8p

- ☒ cout\_cfahbl\_fall\_delay=-12.6p
- ☒ cout\_cfalbh\_fall\_delay=28.8p
- ☒ cout\_crahbh\_rise\_delay=-7.1p
- ☒ cout\_crahbl\_rise\_delay=31.3p

## PMOS only scaling

✓ cout\_cfahbl\_fall\_delay=-12.5p  
✓ cout\_cfalbh\_fall\_delay=26.2p  
✓ cout\_crahbh\_rise\_delay=-6.54p  
✓ cout\_crahbl\_rise\_delay=29.8p

✓ cout\_cfahbl\_fall\_delay=-13.4p  
✓ cout\_cfalbh\_fall\_delay=22.2p  
✓ cout\_crahbh\_rise\_delay=-5.64p  
✓ cout\_crahbl\_rise\_delay=27.9p

---

✓ cout\_cfahbl\_fall\_delay=-14.3p  
✓ cout\_cfalbh\_fall\_delay=20.2p  
✓ cout\_crahbh\_rise\_delay=-5.26p  
✓ cout\_crahbl\_rise\_delay=27.5p

## NMOS only scaling

✓ cout\_cfahbl\_fall\_delay=-15.5p  
✓ cout\_cfalbh\_fall\_delay=46.3p  
✓ cout\_crahbh\_rise\_delay=-12.4p  
✓ cout\_crahbl\_rise\_delay=40.1p

---

✓ cout\_cfahbl\_fall\_delay=-18p  
✓ cout\_cfalbh\_fall\_delay=55.7p  
✓ cout\_crahbh\_rise\_delay=-15.9p  
✓ cout\_crahbl\_rise\_delay=44.6p

✓ cout\_cfahbl\_fall\_delay=-20p  
✓ cout\_cfalbh\_fall\_delay=63.9p  
✓ cout\_crahbh\_rise\_delay=-20p  
✓ cout\_crahbl\_rise\_delay=49p

From these we extract worst-case rise and fall delays for COUT for each sizing case.

## Power Measurements

Average and peak power are obtained using `measure` statements on the supply current:

`avg\_power` – average power over the entire transient

`max\_power` – peak instantaneous power

```
avg_power=-275u
max_power=128u
s0_afbh_rise_delay=72.3p
s0_afbl_fall_delay=44p
s0_arbh_fall_delay=593p
s0_arbl_rise_delay=45p
s0_bfah_rise_delay=72.4p
s0_bfal_fall_delay=569p
s0_brah_fall_delay=592p
s0_bral_rise_delay=49.2p
s1_afblcl_fall_delay=44.1p
s1_arblcl_rise_delay=45.3p
s1_bfalcl_fall_delay=-993p
s1_bralcl_rise_delay=48.9p
s1_cfahbh_fall_delay=-460p
s1_cfalbh_rise_delay=-1.47n
s1_crahbh_rise_delay=25.3p
s1_crahbh_fall_delay=49p
s2_afblcl_fall_delay=44p
s2_arblcl_rise_delay=45.3p
s2_bfalcl_fall_delay=-2.03n
s2_bralcl_rise_delay=48.8p
s2_cfahbh_fall_delay=-415p
s2_cfalbh_rise_delay=-2.47n
s2_crahbh_rise_delay=25.3p
s2_crahbh_fall_delay=49.3p
s3_arblcl_rise_delay=45.3p
s3_bfalcl_fall_delay=1E+30
s3_cfahbh_fall_delay=-2.02n
s3_cfalbh_rise_delay=1E+30
s3_crahbh_rise_delay=25.4p
s3_crahbh_fall_delay=49.8p
temper=70
```

## Proportional Scaling

```
avg_power=-474u
max_power=166u
s0_afbh_rise_delay=60.7p
s0_afbl_fall_delay=38.4p
s0_arbh_fall_delay=581p
s0_arbl_rise_delay=40.8p
s0_bfah_rise_delay=61p
s0_bfal_fall_delay=564p
s0_brah_fall_delay=581p
s0_bral_rise_delay=43.7p
s1_afblcl_fall_delay=39.3p
s1_arblcl_rise_delay=40.8p
s1_bfalcl_fall_delay=-998p
s1_bralcl_rise_delay=43.4p
s1_cfahbh_fall_delay=-468p
s1_cfalbh_rise_delay=-1.49n
s1_crahbh_rise_delay=22.9p
s1_crahbh_fall_delay=45.8p
s2_afblcl_fall_delay=39.3p
s2_arblcl_rise_delay=41.1p
s2_bfalcl_fall_delay=-2.04n
s2_bralcl_rise_delay=43.4p
s2_cfahbh_fall_delay=-431p
s2_cfalbh_rise_delay=-2.49n
s2_crahbh_rise_delay=22.7p
s2_crahbh_fall_delay=45.7p
s3_arblcl_rise_delay=41.1p
s3_bfalcl_fall_delay=1E+30
s3_cfahbh_fall_delay=-2.03n
s3_cfalbh_rise_delay=1E+30
s3_crahbh_rise_delay=22.8p
s3_crahbh_fall_delay=46.2p
temper=70
```

|                             |                             |
|-----------------------------|-----------------------------|
| avg_power=-670u             | s2_afblcl_fall_delay=38.1p  |
|                             | s2_arblcl_rise_delay=39.7p  |
|                             | s2_bfalcl_fall_delay=-2.04n |
|                             | s2_bralcl_rise_delay=41.8p  |
|                             | s2_cfahbh_fall_delay=-436p  |
| max_power=417u              | s2_cfalbh_rise_delay=-2.5n  |
| s0_afbh_rise_delay=57.1p    | s2_crahbh_rise_delay=21.9p  |
| s0_afbl_fall_delay=36.7p    | s2_crahbh_fall_delay=45p    |
| s0_arbh_fall_delay=577p     | s3_arblcl_rise_delay=39.7p  |
| s0_arbl_rise_delay=39.3p    | s3_bfalcl_fall_delay=1E+30  |
| s0_bfah_rise_delay=58.1p    | s3_cfahbh_fall_delay=-2.03n |
| s0_bfal_fall_delay=562p     | s3_cfalbh_rise_delay=1E+30  |
| s0_brah_fall_delay=578p     | s3_crahbh_rise_delay=21.8p  |
| s0_bral_rise_delay=42p      | s3_crahbh_fall_delay=45.5p  |
| s1_afblcl_fall_delay=38.1p  | temper=70                   |
| s1_arblcl_rise_delay=39.6p  |                             |
| s1_bfalcl_fall_delay=-1n    |                             |
| s1_bralcl_rise_delay=42p    |                             |
| s1_cfahbh_fall_delay=-471p  |                             |
| s1_cfalbh_rise_delay=-1.49n |                             |
| s1_crahbh_rise_delay=22p    |                             |
| s1_crahbh_fall_delay=44.8p  |                             |

|                              |                             |
|------------------------------|-----------------------------|
| avg_power=-864u              | s2_afblcl_fall_delay=37.4p  |
| cout_cfahbl_fall_delay=8.31n | s2_arblcl_rise_delay=39.1p  |
| cout_cfalbh_fall_delay=2.07n | s2_bfalcl_fall_delay=-2.04n |
| cout_crahbh_rise_delay=2.08n | s2_bralcl_rise_delay=41.2p  |
| cout_crahbh_rise_delay=30.6p | s2_cfahbh_fall_delay=-438p  |
| max_power=527u               | s2_cfalbh_rise_delay=-2.5n  |
| s0_afbh_rise_delay=56.2p     | s2_crahbh_rise_delay=21.5p  |
| s0_afbl_fall_delay=37.1p     | s2_crahbh_fall_delay=44.4p  |
| s0_arbh_fall_delay=575p      | s3_arblcl_rise_delay=39.1p  |
| s0_arbl_rise_delay=38.7p     | s3_bfalcl_fall_delay=1E+30  |
| s0_bfah_rise_delay=56.1p     | s3_cfahbh_fall_delay=-2.03n |
| s0_bfal_fall_delay=561p      | s3_cfalbh_rise_delay=1E+30  |
| s0_brah_fall_delay=576p      | s3_crahbh_rise_delay=21.2p  |
| s0_bral_rise_delay=41.3p     | s3_crahbh_fall_delay=45.1p  |
| s1_afblcl_fall_delay=35.8p   | temper=70                   |
| s1_arblcl_rise_delay=39.2p   |                             |
| s1_bfalcl_fall_delay=-1n     |                             |
| s1_bralcl_rise_delay=41.2p   |                             |
| s1_cfahbh_fall_delay=-473p   |                             |
| s1_cfalbh_rise_delay=-1.5n   |                             |
| s1_crahbh_rise_delay=21.5p   |                             |
| s1_crahbh_fall_delay=44.4p   |                             |

## PMOS only scaling

|                            |                             |
|----------------------------|-----------------------------|
| avg_power=-382u            | s2_afblcl_fall_delay=36.2p  |
|                            | s2_arblcl_rise_delay=38.1p  |
|                            | s2_bfalcl_fall_delay=-2.04n |
|                            | s2_bralcl_rise_delay=39.8p  |
|                            | s2_cfahbh_fall_delay=-441p  |
| max_power=241u             | s2_cfalbh_rise_delay=-2.51n |
| s0_afbh_rise_delay=53.1p   | s2_crahbh_rise_delay=20.7p  |
| s0_afbl_fall_delay=36.1p   | s2_crahbh_fall_delay=43.3p  |
| s0_arbh_fall_delay=573p    | s3_arblcl_rise_delay=38.1p  |
| s0_arbl_rise_delay=37.6p   | s3_bfalcl_fall_delay=1E+30  |
| s0_bfah_rise_delay=53.2p   | s3_cfahbh_fall_delay=-2.04n |
| s0_bfal_fall_delay=560p    | s3_cfalbh_rise_delay=1E+30  |
| s0_brah_fall_delay=573p    | s3_crahbh_rise_delay=20.6p  |
| s0_bral_rise_delay=39.9p   | s3_crahbh_fall_delay=43.9p  |
| s1_afblcl_fall_delay=36.2p | temper=70                   |
| s1_arblcl_rise_delay=38.1p |                             |
| s1_bfalcl_fall_delay=-1n   |                             |
| s1_bralcl_rise_delay=39.8p |                             |
| s1_cfahbh_fall_delay=-474p |                             |
| s1_cfalbh_rise_delay=-1.5n |                             |
| s1_crahbh_rise_delay=20.7p |                             |
| s1_crahbh_fall_delay=43.3p |                             |

avg\_power=-480u

max\_power=142u  
s0\_afbh\_rise\_delay=44.7p  
s0\_afbl\_fall\_delay=33.1p  
s0\_arbh\_fall\_delay=567p  
s0\_arbl\_rise\_delay=36p  
s0\_bfah\_rise\_delay=45.8p  
s0\_bfal\_fall\_delay=558p  
s0\_brah\_fall\_delay=566p  
s0\_bral\_rise\_delay=37.6p  
s1\_afblcl\_fall\_delay=33.1p  
s1\_arblcl\_rise\_delay=36.8p  
s1\_bfalcl\_fall\_delay=-1n  
s1\_bralcl\_rise\_delay=37.6p  
s1\_cfahbh\_fall\_delay=-478p  
s1\_cfalbh\_rise\_delay=-1.5n  
s1\_crahbh\_rise\_delay=19.6p  
s1\_crahbh\_fall\_delay=42.1p

s2\_afblcl\_fall\_delay=33.1p  
s2\_arblcl\_rise\_delay=36.8p  
s2\_bfalcl\_fall\_delay=-2.04n  
s2\_bralcl\_rise\_delay=37.6p  
s2\_cfahbh\_fall\_delay=-448p  
s2\_cfalbh\_rise\_delay=-2.52n  
s2\_crahbh\_rise\_delay=19.4p  
s2\_crahbh\_fall\_delay=42.4p  
s3\_arblcl\_rise\_delay=36.8p  
s3\_bfalcl\_fall\_delay=1E+30  
s3\_cfahbh\_fall\_delay=-2.04n  
s3\_cfalbh\_rise\_delay=1E+30  
s3\_crahbh\_rise\_delay=19.5p  
s3\_crahbh\_fall\_delay=42.6p  
temper=70

avg\_power=-564u

max\_power=158u  
s0\_afbh\_rise\_delay=42.6p  
s0\_afbl\_fall\_delay=32.5p  
s0\_arbh\_fall\_delay=563p  
s0\_arbl\_rise\_delay=36p  
s0\_bfah\_rise\_delay=45.1p  
s0\_bfal\_fall\_delay=557p  
s0\_brah\_fall\_delay=565p  
s0\_bral\_rise\_delay=37.1p  
s1\_afblcl\_fall\_delay=32.3p  
s1\_arblcl\_rise\_delay=36.8p  
s1\_bfalcl\_fall\_delay=-1.01n  
s1\_bralcl\_rise\_delay=37.1p  
s1\_cfahbh\_fall\_delay=-481p  
s1\_cfalbh\_rise\_delay=-1.51n  
s1\_crahbh\_rise\_delay=19.4p  
s1\_crahbh\_fall\_delay=42.7p

s2\_afblcl\_fall\_delay=32.4p  
s2\_arblcl\_rise\_delay=36.7p  
s2\_bfalcl\_fall\_delay=-2.05n  
s2\_bralcl\_rise\_delay=37.1p  
s2\_cfahbh\_fall\_delay=-453p  
s2\_cfalbh\_rise\_delay=-2.52n  
s2\_crahbh\_rise\_delay=19.3p  
s2\_crahbh\_fall\_delay=42.5p  
s3\_arblcl\_rise\_delay=36.8p  
s3\_bfalcl\_fall\_delay=1E+30  
s3\_cfahbh\_fall\_delay=-2.04n  
s3\_cfalbh\_rise\_delay=1E+30  
s3\_crahbh\_rise\_delay=19.2p  
s3\_crahbh\_fall\_delay=43.2p  
temper=70

## NMOS scaling

avg\_power=-459u  
max\_power=176u  
s0\_afbh\_rise\_delay=106p  
s0\_afbl\_fall\_delay=58.7p  
s0\_arbh\_fall\_delay=628p  
s0\_arbl\_rise\_delay=58.9p  
s0\_bfah\_rise\_delay=106p  
s0\_bfal\_fall\_delay=-934p  
s0\_brah\_fall\_delay=626p  
s0\_bral\_rise\_delay=65.5p  
s1\_afblcl\_fall\_delay=58.6p  
s1\_arblcl\_rise\_delay=58.2p  
s1\_bfalcl\_fall\_delay=-978p  
s1\_bralcl\_rise\_delay=65.5p  
s1\_cfahbh\_fall\_delay=-435p  
s1\_cfalbh\_rise\_delay=-1.43n  
s1\_crahbh\_rise\_delay=32.5p  
s1\_crahbh\_fall\_delay=60p  
s2\_afblcl\_fall\_delay=59p  
s2\_arblcl\_rise\_delay=58.2p  
s2\_bfalcl\_fall\_delay=-2.02n  
s2\_bralcl\_rise\_delay=65.5p  
s2\_cfahbh\_fall\_delay=-371p  
s2\_cfalbh\_rise\_delay=-2.41n  
s2\_crahbh\_rise\_delay=32.4p  
s2\_crahbh\_fall\_delay=60.4p  
s3\_arblcl\_rise\_delay=59p  
s3\_bfalcl\_fall\_delay=1E+30  
s3\_cfahbh\_fall\_delay=-2n  
s3\_cfalbh\_rise\_delay=1E+30  
s3\_crahbh\_rise\_delay=33p  
s3\_crahbh\_fall\_delay=60.8p  
temper=70

---

avg\_power=-538u  
max\_power=245u  
s0\_afbh\_rise\_delay=123p  
s0\_afbl\_fall\_delay=65.6p  
s0\_arbh\_fall\_delay=643p  
s0\_arbl\_rise\_delay=65.2p  
s0\_bfah\_rise\_delay=122p  
s0\_bfal\_fall\_delay=-917p  
s0\_brah\_fall\_delay=642p  
s0\_bral\_rise\_delay=73.3p  
s1\_afblcl\_fall\_delay=65.6p  
s1\_arblcl\_rise\_delay=64.7p  
s1\_bfalcl\_fall\_delay=-971p  
s1\_bralcl\_rise\_delay=74.4p  
s1\_cfahbh\_fall\_delay=-424p  
s1\_cfalbh\_rise\_delay=-1.41n  
s1\_crahbh\_rise\_delay=36p  
s1\_crahbh\_fall\_delay=64.8p  
s2\_afblcl\_fall\_delay=65.6p  
s2\_arblcl\_rise\_delay=64.7p  
s2\_bfalcl\_fall\_delay=-2.01n  
s2\_bralcl\_rise\_delay=74.4p  
s2\_cfahbh\_fall\_delay=-350p  
s2\_cfalbh\_rise\_delay=-2.38n  
s2\_crahbh\_rise\_delay=35.2p  
s2\_crahbh\_fall\_delay=64.8p  
s3\_arblcl\_rise\_delay=64.7p  
s3\_bfalcl\_fall\_delay=1E+30  
s3\_cfahbh\_fall\_delay=-1.98n  
s3\_cfalbh\_rise\_delay=1E+30  
s3\_crahbh\_rise\_delay=35.2p  
s3\_crahbh\_fall\_delay=65.1p  
temper=70

## **Results**

### **Functional Verification**

The full waveform plot ( see page 6 ) shows that:

S0–S3 and C3 (COUT) correctly reflect the binary sum of A and B for all test vectors.

When the sum exceeds 15, the carry ripples through the chain and C3 asserts.

This confirms the logical correctness of the NAND-based 4-bit RCA.

### **Worst-Case COUT Delay**

**From the COUT measurement screenshots, the worst positive propagation delays for COUT were extracted. Example subset:**

Baseline (1× both): rise ≈ 32.3 ps, fall ≈ 31.0 ps

Both scaled 2×: rise ≈ 30.8 ps, fall ≈ 27.9 ps

Both scaled 3×: rise ≈ 40.1 ps, fall ≈ 46.3 ps

Both scaled 4×: rise ≈ 49.0 ps, fall ≈ 63.9 ps

For PMOS-only scaling, delays steadily improve:

P2×: rise ≈ 29.8 ps, fall ≈ 26.2 ps

P3×: rise ≈ 27.9 ps, fall ≈ 22.2 ps

P4×: rise ≈ 27.5 ps, fall ≈ 20.2 ps

For NMOS-only scaling, delays actually worsen, reaching ~44–56 ps.



Proportional scaling initially reduces delay ( $1\times\rightarrow 2\times$ ) but beyond that, increasing capacitance dominates and delay grows again.

PMOS-only scaling gives the best and most consistent delay improvement.

NMOS-only scaling is the worst; the PMOS network becomes the bottleneck.

## Power Results

From the `avg\_power` values:

Baseline: 275  $\mu\text{W}$

### Both scaled:

2 $\times$ : 474  $\mu\text{W}$

3 $\times$ : 670  $\mu\text{W}$

4 $\times$ : 864  $\mu\text{W}$

### PMOS-only:

2 $\times$ : 538  $\mu\text{W}$

3 $\times$ : 382  $\mu\text{W}$

4 $\times$ : 480  $\mu\text{W}$

### NMOS-only

4 $\times$ : 564  $\mu\text{W}$

### Average Power vs Proportional Scaling



### AVG power Vs Scaling (PMOS ONLY)



### **Interpretation:**

Proportional scaling significantly increases dynamic power: more width → higher load capacitance on every node → more switching energy.

PMOS-only scaling has a moderate power overhead, especially at 3× where average power remains comparable while delay improves.

NMOS-only scaling is again inefficient: relatively high power with no speed benefit.

### **Power-Delay Product (PDP)**

The PDP was computed as:

$$[ \text{PDP} = \{\text{Average Power}\} * \{\text{Worst-Case COUT Delay}\} ]$$

(Using COUT rise delay and average power for each case.)

#### **For proportional scaling:**

| Scale | PDP (fJ) |
|-------|----------|
| 1x    | 8.9      |
| 2x    | 14.6     |
| 3x    | 26.9     |
| 4x    | 42.3     |

For PMOS-only scaling:

| Scale | PDP (fJ) |
|-------|----------|
| 2x    | 16.0     |
| 3x    | 10.7     |
| 4x    | 13.2     |

NMOS-only

4x: PDP  $\approx$  25.1 fJ.





#### Important points:

For both-scaled, PDP increases monotonically with scaling.

For PMOS-only, PDP shows a minimum at  $P=3\times$ , making it the most energy-efficient design.

NMOS-only again performs poorly, with high PDP.

## **Discussion**

### **Which Sizing Strategy Optimized Speed?**

The fastest propagation delay is obtained when both PMOS and NMOS are modestly scaled ( $2\times$ ) or when PMOS is scaled more aggressively ( $3\text{--}4\times$ ). This is because:

Wider NMOS transistors speed up the pull-down path.

Wider PMOS transistors compensate for low hole mobility, speeding up the pull-up path.

Balanced scaling (both) initially reduces delay as drive strength dominates, but beyond  $2\times$  the extra capacitance slows down edges.

So if the absolute minimum delay is the main goal (ignoring power), moderate proportional scaling is the best choice.

### **Which Sizing Strategy Minimized PDP?**

PDP balances speed and power. The data shows the lowest PDP occurs at PMOS scaling  $3\times$ :

Delay is significantly improved compared to baseline.

Average power remains low compared with heavy proportional scaling.

This means PMOS-only  $3\times$  scale provides the best energy per operation, making it the most efficient solution in practice.

### **PMOS vs NMOS Strength Effects**

In a NAND gate:

The pull-up network (PMOS in parallel) sets the speed of rising edges.

The pull-down network (series NMOS) sets the speed of falling edges.

### **Therefore:**

If we only scale NMOS, falling edges may become slightly faster, but rising edges remain limited by weak PMOS. The resulting imbalance and added capacitance can even increase overall delay .

If we scale PMOS, both rising delay and overall balance improve, producing better delay and PDP.

This explains why PMOS-only scaling outperforms NMOS-only scaling in both delay and PDP experiments.

## **Overall Power–Performance Trade-Offs**

This project illustrates a classic VLSI trade-off:

Larger transistors → lower resistance → lower delay

But also larger capacitance → higher dynamic power and eventually higher delay beyond a point.

### **Our results show:**

There is a sweet spot where slight upsizing improves performance without excessive power (PMOS 3 $\times$ ).

Beyond that, performance actually degrades, and energy per operation increases

## Conclusion

Proportional scaling minimizes delay only for small factors but greatly increases power and PDP for larger factors.

NMOS-only scaling is ineffective; it increases power and does not improve delay.

PMOS-only scaling , particularly at  $3\times$  width , offers the best energy efficiency (lowest PDP) while still improving delay relative to baseline.

This matches theoretical expectations about CMOS transistor behavior and demonstrates how careful device sizing is essential for balancing speed, power, and energy in modern VLSI design.