

## CMOS Inverter

DC Analysis:  $\frac{v_o}{v_i} = \frac{1}{(n-1)(1 + \omega n)}$

It is a pure steady state analysis.  
So no capacitances.



Vout

|          |   |  |  |
|----------|---|--|--|
| Page No. | 1 |  |  |
| Date     |   |  |  |



When VDD becomes less than threshold of nmos or pmos. Still the devices act as inverter because of leakage currents which flows when there is no channel.

$$\frac{2kT}{q} < V_{Th} + |V_{Tp}| < V_{DD}$$

Current flows  
due to subthreshold  
Leakage

### Adder

Ripple Carry adder of N bit:

$$t_{\text{delay}} = (N-1)t_{\text{carry}} + t_{\text{sum}}$$

Full Adder:

| A | B | Cin | Sum | Carry | Carry Status       |
|---|---|-----|-----|-------|--------------------|
| 0 | 0 | 0   | 0   | 0     | Normal Propagation |
| 0 | 0 | 1   | 1   | 0     | Normal Propagation |
| 0 | 1 | 0   | 1   | 0     | Normal Propagation |
| 0 | 1 | 1   | 0   | 1     | Carry generated    |
| 1 | 0 | 0   | 1   | 0     | Normal Propagation |
| 1 | 0 | 1   | 0   | 1     | Normal Propagation |
| 1 | 1 | 0   | 0   | 1     | Carry generated    |
| 1 | 1 | 1   | 1   | 1     | Carry generated    |

Let's define 3 new variables  $\rightarrow$

$$\text{Generate } (G) = AB$$

$$\begin{aligned} \text{Propagate } (P) &= A \oplus B \quad (\text{This can be} \\ &\text{implemented as } A+B) \\ K_{11} &= AB \end{aligned}$$

$$C_{\text{out}} = G_i + P C_{\text{in}}$$

$$S = P \oplus G_i$$

- (\*) The normal full adder is implemented using 28 transistors by CMOS logic.

In this the I.E. of

$$A = x; B = y; A_i = z; C_{\text{in}} = 2$$

- (\*) To reduce the logical effort we can use mirror full adder implementation.

In this the I.E. of

$$A = x; B = y; C_{\text{in}} = 2 \quad (\text{Save})$$

~~Q~~ → Consider a NAND gate.



Draw the CMOS equivalent of this.  
In the NMOS connection will you connect A near to output node or near to ground.

Sol - Now look for the time taken when  $B=1$  &  $A=0$ .  
These will be charge distribution.



(II) When  $B=0$  &  $A=0$

Then  $M_1$  will discharge  $C_2$  to ground.

When  $A=0 \rightarrow 1$  &  $B=1$

Then  $M_1$  &  $M_2$  needs to discharge only  $C_1$  to ground. So this takes less time.

Hence,

Carry Signals are critical Transistors connected to  $C_1$  are placed closest to the output.

- (1) Ripple Adder
- (2) Carry-bypass Adder
- (3) Linear Select Adder
- (4) Square root Adder

(5) Carry-select Adder

- Carry-bypass (Linear select) ✓
- Carry-select (square root)
- Look-ahead
- Carry select adder

Ans 2B

After all these adders, can we have an adder in which output delay is independent of carry input.

Such type of adder are called carry lookahead adder.

For Example for 2 bits,

$$P_{1:0} = P_1 \cdot P_0$$

$$G_{1:0} = G_1 + P_1 \cdot G_0$$

$$C_{out} = G_{1:0} + P_{1:0} \cdot C_{in}$$

$$= (G_1 + P_1 \cdot G_0) + (P_1 \cdot P_0) \cdot C_{in}$$

$$C_{out} = A_1 B_1 + (A_1 \oplus B_1)(A_0 B_0) + (A_1 \oplus B_1)(A_0 \oplus B_0) C_{in}$$

This carry out is independent of carry propagation delay.



For 8 bits, we use this 2 bit

with previous four bit structure



We are constantly working to minimize the carry propagation delay in our adder.

Q → Why don't we use such a number system which never generates a carry?

$\{0, 1, \bar{1}\}$  → Binary redundancy or  
Canonically signed digits

Eg → 7: 0111 (Binary)

Whenever you encounter 2's in series then convert them into binary redundant.

7: 1001 (Binary redundant)

Eg → 011011001 (Binary)

11  
00 101001

↓  
100101001 (Binary redundant)

## Multiplexer

$$(n \text{ bits}) \times (m \text{ bits}) = (m+n-1) \text{ bits}$$

✓ Booth Multiplier  
Normal Multiplier  
Wallace Multiplier

## Memories

### Registers versus Memory:

Very small memory. To store bits.

Large storage space. To store instructions.

Density is high but

To store data. To store instructions.

bits. Instructions.

### How to store a bit?

(1) Feedback Mechanism (Static)

(2) Capacitor using charge mechanism (Dynamic)

Static  
Memory  
or  
SRAM



Stuck at V<sub>DD</sub>



Ring Oscillators



Bit Storage Element

~~Dynamic~~

~~Memory~~

~~OS~~

~~DRAM~~

It is impossible to build a isolated capacitor.

We need a transistor to read or write data.

### ③ Flash memory or non volatile

→ Stores data even after supply voltage is off.

→ Done by changing some parameters of our transistor.

→ Change the threshold



Read-Write  
Memory

Non-Volatile  
Read-Write Mem.

Read-Only  
Memory

Random Access

Non-random Access

EPROM

E<sup>2</sup>PROM

FLASH

Mask-Programmed

Programmable PROM

SRAM

DRAM

FIFO

LIFO

Shift-Register

CAM

Read data from 1

Written data to 10.

Slow write

Fast Read

## SRAM



- Data stored as long as supply is applied.
- Large (6 transistors/cell)
- Fast
- Differential o/p.

3 possible stable states →



## SRAM in metastable state



A, B, C → 3 stable point  
C → Metastable point

If o/p at C then  $O/P = V_m$   
But if o/p changes  $\pm \Delta$  or  $\rightarrow$  from C then o/p changes to 0 or 1 very rapidly respectively.

### Writing a SRAM:



Initially the SRAM has 1 stored  
Now we want to store 0.

Can we use 1 MOS with write signal on the gate.

No, because at node A, 0 and 1 both will be present so it becomes high impedance state.

### Way 2:



This circuit works fine, but we have introduced an extra mos.

### Way 3:



Similar to way 1

At node A, the M1 transistor wants to dec. the voltage while M2 wants to inc. the voltage.

So, we can use zaiined logic to change the relative strengths of M1 and M2. Meaning,

If we dec. the resistance of M1 and inc. the resistance of M2. Then M1 can overpower and we can write into the SRAM.

→ We generally don't follow this method because the size of transistors will be different and increased.

Page No. \_\_\_\_\_  
Date \_\_\_\_\_

Way 4:

Page No. \_\_\_\_\_  
Date \_\_\_\_\_

'E' path



∴ Total 6 transistors

This way is used practically

CMOS Implementation:



Page No. \_\_\_\_\_  
Date \_\_\_\_\_

BL

a) WL = 1, P = 0  $\Rightarrow$  A path is formed between BL and VDD. This is called a 'read operation'.  
b) WL = 0, P = 1  $\Rightarrow$  A path is formed between BL and VSS. This is called a 'write operation'.  
c) WL = 1, P = 1  $\Rightarrow$  A path is formed between BL and VDD. This is called a 'read operation'.  
d) WL = 0, P = 0  $\Rightarrow$  A path is formed between BL and VSS. This is called a 'write operation'.

Read Operation:

Reading 1:



Reading 1 means charging the capacitance of BL by enabling M6 and transferring the charge from capacitance of M4 and M3, to the capacitor of BL.

Reading 0:



Reading 0 means discharging the capacitor of BL through M6.  
 This is a tedious task because if  $C_{BL}$  was initially charged. As soon we switch on the MOS M6, the capacitor  $C_{BL}$  starts charging  $C_{D,M3}$  because the bid line were very big (1024 bits) so its capacitance must also be big.

Inverter Delay analysis

for calculating  $T_{PUSH-IN}$  and  $T_{PULL-OUT}$

$$D_{out} = \frac{V_{DD}}{2} \ln \frac{I_{DSAT}}{I_{DS}}$$



Assume  $V_{DD} \gg V_{DSAT}$ , so out transistor is in velocity sat always

The red line above is not a resistor for our approximation.

It looks like a current source.

(\*) In saturation, transistor acts like a current source



Here, we have assumed current to be constant, which is not completely correct.

- ✳ We can make our calculations more correct by taking the average current at  $V_{DS} = V_{DD}$  and  $V_{DS} = \frac{V_{DD}}{2}$

$$I_{DSAT,avg} = \frac{1}{2} \left[ I_{DSAT}(1 + 2V_{DD}) + I_{DSAT}\left(1 + 2\frac{V_{DD}}{2}\right) \right]$$

$$= I_{DSAT} \left[ 1 + 3 \lambda V_{DD} \right]$$

We can model this as a combination of a current source & a resistor.

Solving the RC circuit,

$$V_{out} = \left( V_{DD} + \frac{1}{R} \right) e^{-\frac{C}{R I_{DSAT}}} - \frac{1}{R}$$

For Small  $\lambda$ :

$$t_p = \frac{C V_{DD}}{2 I_{DSAT} (1 + 2V_{DD})}$$

$$\frac{\Delta V_{DD}}{2 I_{DSAT}} = q^*$$

### RC Model



We can see that RC model is approximately equal to NMOS. There are two approaches to calculate R to use the RC model.

#### Approach 1:

We can find the area of shaded region by integration to find R.



Or, find area by,

$$R = \frac{R(\text{at } V_{DS} = V_{DD}) + R(\text{at } V_{DS} = \frac{V_{DD}}{2})}{2}$$

Approach 2:

$$t_p = t_{p,RC}$$

$$\frac{CV_{DD}}{2I_{D,SAT}(1+2V_{DD})} = (\ln 2) R_{req} C$$

$$R_{req} = \frac{V_{DD}}{2I_{D,SAT}(\ln 2)(1+2V_{DD})}$$

\* Both the approaches will give the same delay.

Transistor as a switch

$R_{req}$



$$(I_{DSS} - I_D) + (I_D - I_{DS}) = K(V_{GS} - V_{TH}) = R$$

For  $I_{DS} = 0$

New Capacitors,



$$C_L = C_{dB,p} + C_{dB,n} + C_o + C_{g,p} + C_{g,n}$$

And,

$C_{dB,p}$  can be split as  
Miller Capacitance

## Power Consumption in CMOS

### ① Switching Power / Dynamic power -

Some  
inp.  
points  
in  
phase  
gallary

Charging / Discharging capacitors.

$$\text{Power} = C V_{DD}^2 f, \text{ if } V_{swing} > V_{DD}$$

$$C V_{swing} V_{DD} f, \text{ if } V_{swing} < V_{DD}$$

→ No. of  $0 \rightarrow 1$  transitions or how many times we switch the capacitors.

(which may not be equal to clock freq.)

### ② Leakage power -

Transistors are not perfect switches.

### ③ Short-circuit power -

Some  
inp.  
points  
in  
phase  
gallary

Both pull up & pull down or during transition. Due to direct path current.

If rise time or fall time is large then this power dissipation becomes large.

### ④ Static currents -

Bias currents due to rev. biased diodes.

### \* Energy consumed in N cycles, $E_N$

$$E_N = C_L V_{DD}^2 \cdot n_{0 \rightarrow 1}$$

$n_{0 \rightarrow 1} \rightarrow$  No. of  $0 \rightarrow 1$  transitions in N cycles.

Aud,

$$P_{avg} = G V_{DD}^2 f \cdot d_{0 \rightarrow 1}$$

Probability that  
how many caps switch  
out of total caps

### ⑤ Suppose we give input to a inverter.



If  $C_L$  is very large, then the circuit will take infinite time to react to input changes. Hence, during this time approximately 0 current flows.

If  $C_L$  is very small, then the  $C_L$  will be charged rapidly. Hence large amount of current flows through the circuit.



So there is a tradeoff,

fast ckt  $\rightarrow$  more power consumption.

To further dec. the short circuit power:

Make large  $C_L$  so that  $dI/dt$  (rise/fall time)  $\rightarrow 0$ .

Make the risetime & fall time slopes to be equal.

To drive a small load use small size transistor.

(2)

Static

Leakage power  $\rightarrow$  Steady State power  $\rightarrow$



When  $0 < V_{out} < V_{DD}$ , then diodes are reverse biased, then we have reverse leakage current of diodes.

These leakage currents can discharge the capacitor. It becomes impossible in case of DRAM when our data gets lost and we need to refresh our data at regular intervals.

$$I_{leak} = I_s \exp\left(\frac{V_D}{V_T}\right)$$

$I_{leak}$  doubles for every  $9^\circ C$



Q → Why does current flow even if we have no channel below  $V_{th}$ ? ( $V_{th}$ )?



Without the channel the mos structure can be viewed as a BJT.

However BJT is like  $n^+$ / $p^-n^-$  but still it is assumed to be a BJT.

$C_0 \rightarrow$  Capacitance due to oxide

$C_d \rightarrow$  depletion region capacitance

So this is like a capacitive divider. The input voltage ( $V_g$ ) gets coupled and produce a current through the BJT

Page No. \_\_\_\_\_  
Date \_\_\_\_\_

Page No. \_\_\_\_\_  
Date \_\_\_\_\_

Page No. \_\_\_\_\_  
Date \_\_\_\_\_



In Sub-threshold Condition

$$I_D = I_0 \exp\left(\frac{V_{G_S} - V_T}{(\frac{KT}{2})n}\right)$$

$$n = 1 + \frac{C_0}{C_{ox}}$$

$$\text{Slope : } S = n \left( \frac{KT}{2} \right) \left( \frac{I_{D_2}}{I_{D_1}} \right)$$

For,  $\frac{I_{D_2}}{I_{D_1}} = 10$  and  $n = 1$

*I can never go below 1. But it is also not possible until today's methods.*

Added MOS has this

$S = 2.60 \text{ to } 100 \text{ mV/decade}$

Suppose,  $S = 60 \text{ mV/decade}$  and  $V_{th} = 360 \text{ mV}$

$$\frac{360}{60} = 6 \quad \therefore I_D \text{ at } V_{G_S} = 0 \text{ is } 1 \text{ times}$$

the  $I_0$  at  $V_{G_S} = V_{th}$

④ Practical MOS have  $S = 90$  to  $95 \text{ mV/dec}$

|          |          |
|----------|----------|
| Page No. | 105      |
| Date     | 10/10/23 |

|          |          |
|----------|----------|
| Page No. | 105      |
| Date     | 10/10/23 |

⑤ FinFETs have been used to lower the leakage.  $S = 70$  to  $75 \text{ mV/decade}$ . So, leakage current is reduced (Tunnel field effect transistors)

⑥ TFETs have  $S = 40$  to  $45 \text{ mV/decade}$ . This is the future of VLSI industry. Lesser amount of leakage current.

Sub threshold design:



Plot of  $V_{GS} < V_{TH}$  i.e. in Subthreshold condition.

We can see that the plot looks like that of normal MOS but the current  $I_D$  is exponentially lower than what we get in normal MOS at  $V_{GS} > V_{TH}$ . So this current is not useful for any practical purpose.

Calculation  $R_{off}$

When  $V_{GS} = V_{DD}$ .



PMOS:  $0 \rightarrow R_{off}$   
NMOS:  $0 \rightarrow R_{on}$

$$R_{off} = \left( \frac{I_{leak}}{V_{DD}} \right)^{-1}$$

$$\therefore R_{off} = \frac{V_{DD}}{I_{leak}} = \frac{V_{DD}}{I_0 \exp\left(\frac{V_{GS}-V_{th}}{nKT}\right)}$$

$$= \frac{V_{DD}}{I_0 \exp\left(-\frac{V_{th}}{nKT}\right)} = \frac{V_{DD} \exp(V_{th}/nKT)}{I_0}$$

Similarly when  $V_{GS} = 0$

Threshold Variation



Due to DIBL  
for short L

## Energy and Delay

note: Always Tradeoff



$$\textcircled{X} \quad P_{\text{total}} = P_{\text{dyn}} + P_{\text{static}} + P_{\text{direct path}}$$

$$= C V_{DD}^2 f_{DD} + V_{DD} I_{\text{leak}} +$$

$$V_{DD} \cdot I_{\text{peart}} \cdot f_{DD}$$

## CMOS logic is Revisited



Delay using Elmore delay,

$$t_p = 0.69 (R_1 C_1 + (R_1 + R_2) C_2 + (R_1 + R_2 + R_3) C_3 + (R_1 + R_2 + R_3 + R_4) C_4)$$

If we use our traditional approach of sizing then  $R_1 = R_2 = R_3 = R_4 = 4R$ .

This method will give more delay since, resistor  $R_1$  is coming 4 times and  $R_4$  is coming 1 time in the delay equation.

So we make  $R_1$  the smallest and  $R_2$  the largest, to reduce the delay.

Note: This tapered sizing comes into play when intrinsic capacitance like  $C_g$ ,  $C_e$ ,  $C_s$  comes into play.

If only  $C$  was present then the traditional method will work.

Layout Perspective:



Traditional MOSFET has equal width and length. Method of sizing is to have equal Sizing, i.e.,  $W = L$ .



Tapered Structure  
According to layout rules, there is a minimum distance b/w the gate of MOS and the edge of the diffusion region ( $x$ ) due to which the distance b/w two gates inc. ( $y$ ). Hence capacitance increases.

(\*) Propagation delay decreases rapidly as a function of  $y$  -  $\propto$  quadratically in the worst case.

NAND tp

Page No. \_\_\_\_\_  
Date | |



The t<sub>PLH</sub> is slightly inc because as fanout increases, more pmos are connected in parallel. Hence the worst case path is still 1 transistor

Gates with a fanout of 4 is not used

For t<sub>PLH</sub>; Resistance is fixed by 1 pmos (worst case)

But Capacitance inc linearly

UN 809  
tp

Page No. \_\_\_\_\_  
Date | |

intjed intjed intjed intjed intjed  
TP-NAND  
fanout  
tp-NAND  
tp-INV  
tp-out  
effect  
fanout  
 $\left(\frac{Q}{C_L}\right)$

Delay tp:  
Fan-in: Quadratic due to increasing resistance and capacitance.

Fan-out: Each additional fan-out gate adds two gate capacitances but fails to change the load value of the output (q) which remains?

$$t_{PLH} = Q_1 R_{on} + Q_2 C_L I_{out}^2 + Q_3 F.O. I_{out}$$

## Design Technique for fast Complex Gates

### ① Transistor Sizing →

→ Traditional method ; when  $C_L$  dominates.

→ Progressive Sizing ; when Intrinsic cap. come into sole.

### ② Transistor Ordering →

Critical path transistors are placed closed to output.

### ③ Alternate logic structures →

Select the best possible structure based on logical effort and Intrinsic delay ( $p$ ), when  $C_L$  is high or low.

### ④ Isolation fan-in from fan-out using buffer insertion. (Path effort needs to be distributed equally)

### ⑤ Reduce the Voltage swing →

$$t_{PHL} = 0.5 \frac{C_L}{I_{DSAT}} V_{swing}$$

Stages. So, more the no. of stages better path effort will

- Linear reduction in delay.
- also reduces power consumption.

total 20M2 + 20R total 20M2

## Ratioed Logic

- ④ Fastest gate by CMOS logic  $\rightarrow$  duration
- ④ Compose different gates by inverter using logical effort.

$$\begin{aligned} C_{L,inv} &= 3G_a \\ &= 2C_{Cr} + C_{Cs} \\ &\quad \uparrow \quad \uparrow \\ &\text{PMOS} \quad \text{NMOS} \end{aligned}$$

(So we want to remove PMOS because it produces more delay)

Goal: Build gates faster/smaller than static complementary CMOS.

These 3 are ratioed logic gates:

### 1 Resistive Load



Here  $V_{OH} = VDD$   
 $\because$  when PDN is off no current flows

And  $V_{OL} > 0$

Because when PDN is on current flows through resistor  $R_L$  &  $R_{PDN}$  which doesn't allow  $V_{OL}$  to be 0

②

## Depletion Load



Since the area of  $R_L$  is big So we replace  $R_L$  by transistor based resistors

③

## Pseudo-NMOS



Ratioed logic  $\rightarrow$  logic values depend on sizing of gates.

Ratioless logic  $\rightarrow$  logic values doesn't depend on sizing of gates.

Eg  $\rightarrow$  CMOS

\* Use NOR gates, not NAND gates to reduce sizing



NOR

NAND

$$V_{OH} = V_{DD}$$

\*  $V_{OL}$  depends on PMOS to NMOS ratio

Rational logic logical Effort

\* Rising and falling delays aren't the same

∴ Calculate LE for the two edges separately.

$$LE = \frac{R_g \cdot C_g}{R_{inv} \cdot C_{inv}}$$

LH:

Duty  
PMOS ON  
NMOS OFF

$$C_{inv} = \frac{3}{2} w C_L$$

$$C_{nor} = w C_L$$

$\therefore LE = \frac{2}{3}$  (For Low to High)  
(Better than Inv)

HL:

Both PUN and PDN are ON

$$\therefore R_{eff} = R_n || R_p$$

$$\therefore T_{PHL} = R_{eff} \cdot C_L$$

which is less than CMOS case when,

$$T_{PHL} = R_{nmos} C_L - nI = T$$

∴ Effective delay reduces

⇒

But intuitively we can say that when  $C_L$  is discharging, the pmos was also on, so it also tries to charge  $C_L$ . Hence  $t_p$  is

## Response on falling Edge



$$I = I_N - I_P \quad \text{and} \quad I_P = \frac{V_{DD}}{R_n + R_P} \quad \text{so} \quad I = I_N - \frac{V_{DD}}{R_n + R_P}$$

$$R_{eff} = \frac{1}{\frac{1}{R_n} + \frac{1}{R_P}}$$

$$\therefore R_{eff} = \frac{1}{\frac{1}{R_n} + \frac{1}{R_P}}$$

$$R_{diff} = R_n + \frac{1}{\frac{1}{R_n} + \frac{1}{R_P}}$$

When delay is different by two methods for conflict b/w currents, then use this method.

## For High To Low:

NOR:

$$R_{gate} = \frac{R_n}{1 - \left(\frac{R_n}{R_p}\right)} = \frac{R_n}{1 - \left(\frac{R_n}{2R_n}\right)} = 2R_n$$

$$C_{gate} = WCC_0$$

Invert:

$$R_{inv} = R_n$$

$$C_{inv} = 3WCA_0$$

Page No. \_\_\_\_\_  
Date \_\_\_\_\_

$$LE_{HL} = \frac{2R_n \cdot WCC_0}{R_n + 3WCA_0} = \frac{2}{3}$$

(Better than inverter)

$\therefore$  NOR is better than inverter but only power dissipation is large

\* Now to eliminate static power dissipation we use different method of implementation called Differential Cascode Voltage Switch logic (DCVSL)

Haz, +ve feedback and differential logic



(\*) PDN<sub>1</sub> and PDN<sub>2</sub> are mutually exclusive, i.e. PDN<sub>1</sub> conducts and PDN<sub>2</sub> is off.  
and vice versa.

(\*) PDN<sub>1</sub> and PDN<sub>2</sub> are complementary of each other and uses only NMOS.

(\*) Suppose, PDN<sub>1</sub> conducts & PDN<sub>2</sub> doesn't  
 $\therefore \text{Out} = 1$  &  $\text{Out} = 0$

(\*) Now, PDN<sub>1</sub> tries to dec. the val of out from VDD to  $V_{DD} - |V_{t,p}|$  at which M<sub>2</sub> conducts.

As soon as M<sub>2</sub> conducts  $\text{Out} = VDD$  which makes M<sub>1</sub> off. Hence out starts discharging through PDN<sub>1</sub> to ground.

### Properties:

- ① Full Swing
- ② Static power dissipation is 0
- ③ Rail-to-rail logic like others
- ④ Complex in design
- ⑤ Power due to short circuit is present.

Example: NAND / AND



### Advantages:

- ① Signal and its complement are available together. No need for an extra inverter (extra delay) for complement

### Disadvantages:

- ① Lot of wires
- ② Shoot circuit power