

ECE 6473  
Lecture 4  
Date: 10/05/2015

Shaloo Rakheja  
Assistant Professor  
Electrical and Computer Engineering, NYU

# Reading

- Power point and hand-written lecture notes posted on [newclasses.nyu.edu](http://newclasses.nyu.edu)
- Sections 6.1, 6.2.1 from Digital Integrated Circuits by Jan M. Rabaey et al.

# Homework # 2

- **Due today without any exceptions. If the server is down, please email your homework to the TAs.**

# Homework 1 statistics

Total number of submissions = 84

Mean = 69

Median = 75

Standard Deviation = 22.564

Max. score = 98 (# of students: 1)

Min. score = 0 (# of students: 5)

# Homework 1 statistics



# CMOS NAND and NOR gates

1. Switching threshold voltage
2. Sizing of gates for delay
3. Dependence of delay on input patterns
4. Fan-in impact on delay
5. Design techniques to minimize delay
6. Logical and branch efforts

# CMOS logic devices



PUN and PDN are **dual** logic networks

# CMOS NAND gate

| A | B | Out |
|---|---|-----|
| 0 | 0 | 1   |
| 0 | 1 | 1   |
| 1 | 0 | 1   |
| 1 | 1 | 0   |

Truth Table of a 2 input NAND gate



# CMOS NOR gate

| A | B | Out |
|---|---|-----|
| 0 | 0 | 1   |
| 0 | 1 | 0   |
| 1 | 0 | 0   |
| 1 | 1 | 0   |

Truth Table of a 2 input NOR gate



# Complex CMOS logic gate



# Voltage transfer characteristics: NAND



If both inputs switches from 0 to  $V_{DD}$ , switching threshold is higher as both PMOS's are initially ON

# Simplification series connected transistors



(a) Separate transistors

*Two devices with*

$$\beta = \left( \frac{W}{L} \right) \mu C_{ox}$$

(b) Single equivalent FET

*Single device with*

$$\beta^{(1)} = \left( \frac{W}{2L} \right) \mu C_{ox} = \frac{\beta}{2}$$

# Simplification parallel connected transistors



(a) Separate transistors

*Two devices with*

$$\beta = \left( \frac{W}{L} \right) \mu C_{ox}$$



(b) Single equivalent FET

*Single device with*

$$\beta^{(1)} = \left( \frac{2W}{L} \right) \mu C_{ox} = 2\beta$$

# Switching threshold of NAND2 gate: simultaneous switching of inputs



$$V_M = \frac{V_{DD} - |V_{thp}| + (1/2)\sqrt{\beta_n/\beta_p} V_{thn}}{1 + (1/2)\sqrt{\beta_n/\beta_p}}$$

# Switching characteristics: NAND2



# Capacitances



$$C_{out} = C_{ext} + \sum_{M2, M3, M4} (C_{jn} + C_{miller})$$

$$C_X = \sum_{M1, M2} (C_{jn} + C_{miller})$$

@ output node, we have external capacitance,  $C_{ext}$  and sum of parasitic (junction and miller) capacitances of M2, M3, M4.

@ internal node, we only have parasitic (junction and miller) of M1 and M2.

# Switching characteristics: NAND2



*Low-to-high Transition*



*High-to-Low Transition*

$$t_{pLH} = 0.69 R_p C_{out}$$

$$t_{pHL} = 0.69(R_n + R_n)C_{out} + 0.69R_n C_X$$

# Transistor sizing for NAND2

- First identify the path for the worst-case delay (high to low; low to high).
- Ignore the internal node capacitance.
- **Find the size of PFET in NAND2 that gives the same  $t_{pLH}$  as the minimum-sized inverter.**
- **Find the size of NFET in NAND2 that gives the same  $t_{pHL}$  as in the minimum-sized inverter.**

# Transistor sizing for NAND2

- First identify the path for the worst-case delay (high to low; low to high).
  - Ignore the internal node capacitance.
- **Find the size of PFET in NAND2 that gives the same  $t_{pLH}$  as the minimum-sized inverter.**
  - **Find the size of NFET in NAND2 that gives the same  $t_{pHL}$  as in the minimum-sized inverter.**

Only do for the worst case delay path in NAND2.

- Only one PMOS is working for Low to High transition.
- Both NMOS are working for High to Low transition

# Transistor sizing for NAND2 gate



PFET size in NAND2 is the same as in min. sized inverter.  
That is  $\beta_p$  NAND2 =  $\beta_p$  inverter.

NFET size in NAND2 is doubled.  
That is  $\beta_n$  NAND2 =  $2\beta_n$  inverter.

# Input pattern dependence in switching characteristics of NAND2: **Low to High**



- **When  $B=1$ , and  $A: 1 \rightarrow 0$ , the pull-up PMOS needs to charge both  $C_{out}$  and  $C_x$**
- **When  $A=1$ , and  $B: 1 \rightarrow 0$ , the pull-up PMOS needs to charge only  $C_{out}$**

# Input pattern dependence in switching characteristics of NAND2: High to Low



- When  $B=1$ , and  $A: 0 \rightarrow 1$ , the **internal node is charged and both  $C_{out}$  and  $C_x$  needs to be discharged**
- When  $A=1$ , and  $B: 0 \rightarrow 1$ , the **internal node is already discharged and only  $C_{out}$  needs to be discharged**

# NOR2 gate



If both inputs switches from 0 to  $V_{DD}$ , switching threshold is lower as both NMOS turns ON

# Switching threshold of NOR2 gate: simultaneous switching of inputs



$$V_M = \frac{V_{DD} - |V_{thp}| + 2\sqrt{\beta_n/\beta_p} V_{thn}}{1 + 2\sqrt{\beta_n/\beta_p}}$$

# Switching characteristics: NOR2

$$t_{pHL} = 0.69 R_n C_{out}$$



(a) Discharging circuit



$$t_{pLH} = 0.69 \left[ 2R_p C_{out} + R_p C_x \right]$$

(b) Charging circuit

# Transistor sizing for NOR2

- First identify the path for the worst-case delay (high to low; low to high).
  - Ignore the internal node capacitance.
- **Find the size of PFET in NOR2 that gives the same  $t_{pLH}$  as the minimum-sized inverter.**
  - **Find the size of NFET in NOR2 that gives the same  $t_{pHL}$  as in the minimum-sized inverter.**

Only do for the worst case delay path in NOR2.

- Both PMOS are working for Low to High transition.
- Only one NMOS is working for High to Low transition

# Transistor sizing for NOR2 gate



PFET size in NOR2 is doubled from the min. sized inverter.  
That is  $\beta_{p\_NOR2} = 2\beta_{p\_inverter}$ .

NFET size in NOR2 is the same as in min. sized inverter.  
That is  $\beta_{n\_NOR2} = \beta_{n\_inverter}$ .

# Sizing of complex logic gate

$$f = \overline{(a.b + c.d)}.x$$




# Fan-in consideration



$$t_{pHL} \propto (R_1)C_1 + (R_1 + R_2)C_2 + (R_1 + R_2 + R_3)C_3 + (R_1 + R_2 + R_3 + R_4)C_L$$

# Fan-in consideration



$$t_{pHL} \propto (R_1)C_1 + (R_1 + R_2)C_2 + (R_1 + R_2 + R_3)C_3 + (R_1 + R_2 + R_3 + R_4)C_L$$

Say  $R_1 = R_2 = R_3 = R_4 = R_{NMOS}$

$$C_1 = C_2 = C_3 = C_N$$

$$C_L = C_{ext} + 4C_p + C_N$$

$$t_{pHL} \propto R_{NMOS} (C_N + 2C_N + 3C_N) + R_{NMOS} C_L$$

$$t_{pHL} \propto R_{NMOS} (C_N + 2C_N + 3C_N) + 4R_{NMOS} (C_{ext} + 4C_p + C_N)$$

# Extend Fan-in to N inputs



$$t_{pHL} \propto R_{NMOS} (C_N + 2C_N + 3C_N) + R_{NMOS} C_L$$

$$t_{pHL} \propto R_{NMOS} (C_N + 2C_N + 3C_N) + 4R_{NMOS} (C_{ext} + 4C_p + C_N)$$

$$t_{pHL} \propto R_{NMOS} C_N (1 + 2 + 3 + \dots + N) + NR_{NMOS} C_p + NR_{NMOS} C_{ext}$$

$$t_{PHL} \propto 0.5N(N+1)R_{NMOS} C_N + NR_{NMOS} C_p + NR_{NMOS} C_{ext}$$

Propagation delay degrades rapidly as a function of fan-in- **quadratically** in the worst case.

## Effect of fan-in and fan-out

- Fan-in: **quadratic** due to increasing resistance and capacitance
- Fan-out: each additional fan-out gate adds **two** gate capacitances to  $C_L$

$$t_p = a_1 Fl + a_2 Fl^2 + a_3 Fo$$

# **Design Techniques for fast complex gate**

# Design Technique - 1

## □ Transistor sizing

- as long as fan-out capacitance dominates

## □ Progressive sizing



Distributed RC line

$M_1 > M_2 > M_3 > \dots > M_N$   
(the fet closest to the  
output is the smallest)

# Design Technique - 2

## □ Transistor ordering



delay determined by time to discharge  $C_L$ ,  $C_1$  and  $C_2$



delay determined by time to discharge  $C_L$

# Design Technique - 3

- Alternative logic structures to reduce Fan-in

$$F = ABCDEFGH$$



*Higher Fan-in*



*Higher fan-out*

# Design Technique - 4

- Isolating fan-in from fan-out using buffer insertion



**Logical Effort (g)**  
**Electrical Effort (h)**  
**Branching Effort (b)**

**Stage effort =  $f = g \cdot h \cdot b$**

# Logical effort of a complex gate



$$\text{Logical effort: } g = \frac{C_{in}}{C_{ref}}$$

$$\text{Electrical effort: } h = \frac{C_{out}}{C_{in}}$$

This was previously defined as effective fan-out

What is  $C_{ref}$  ?

Input capacitance of a reference inverter

Reference inverter has  $\beta_n = \beta_p$

***r is chosen such that  $\beta_p = \beta_n$***

$$C_{in} = (A_{Gn} + A_{Gp})C_{ox}$$

$$C_{ref} = C_{ox}L(1+r)W_n = C_n(1+r)$$

$$C_{in} = C_{ref} \Rightarrow$$



$$\text{Logical Effort} = g = \frac{C_{in}}{C_{ref}}$$

$$\text{Electrical Effort} = h = \frac{C_{out}}{C_{in}}$$

Note:  $h$  is also the effective fanout ( $f$ ) introduced earlier

# Reference inverter

- A reference inverter is the minimum sized inverter (also called “1X inverter” many times) for that technology.
- It is sized such that:  
 $W_n = W_{\min}$  and  $W_p = rW_n$
- $r \approx 2$  (0.25  $\mu\text{m}$  technology) gives  $R_n = R_p = R_{\text{ref}}$
- $C_{\text{ref}} = (1+r)C_n$
- By default, the logical effort of 1X inverter is unity. That is,  $g_{\text{inv}} = 1$

# Delay of 1X or reference inverter



$$\text{delay time} = d_{abs} = kR_{ref} \left( C_{FET,ref} + C_{out} \right)$$

Parasitic  
internal

# Delay of 1X or reference inverter



$$\begin{aligned}
 d_{abs} &= kR_{ref}C_{FET\_ref} + kR_{ref}C_{out} \\
 &= kR_{ref}C_{ref} \left( \underbrace{\frac{C_{FET\_ref}}{C_{ref}}}_{\tau_r} \right) + kR_{ref}C_{ref} \left( \underbrace{\frac{C_{out}}{C_{ref}}}_{\tau_r} \right) \\
 &\quad p \qquad \qquad h
 \end{aligned}$$

$$d = \frac{d_{abs}}{\tau_r} = p + h$$

# Delay of a bigger inverter (Scale factor = S)



$$d_{abs} = k \frac{R_{ref}}{S} \left( SC_{FET\_ref} + C_{out} \right)$$

# Delay of a bigger inverter (Scale factor = S)

$$\begin{aligned} \text{delay time } d_{abs} &= kR(C_{FET} + C_{out}) = k \frac{R_{ref}}{S} (SC_{FET,ref} + C_{out}) \\ &= kR_{ref}C_{FET,ref} + k \frac{R_{ref}}{S} \frac{C_{out}}{C_{ref}} C_{ref} = kR_{ref}C_{FET,ref} + kR_{ref}C_{ref} \left( \frac{C_{out}}{C_{in}} \right) \\ &= \underbrace{kR_{ref}C_{ref}}_{\tau_r} \left( \underbrace{\frac{C_{FET,ref}}{C_{ref}}}_{p} + \underbrace{\frac{C_{out}}{C_{in}}}_{h} \right) \end{aligned}$$

$$\text{normalized delay } d = \frac{d_{abs}}{\tau_r} = p + h$$

# Delay of a bigger inverter (Scale factor = S)

$$d = \frac{d_{abs}}{\tau_r} = p + h$$

no effect of  
size on “p”

Size affects “h”

## 2-stage inverter chain



$$D = d_1 + d_2 = (h_1 + p_1) + (h_2 + p_2)$$

$$\text{path electrical effort} = H = C_{last}/C_{first} = C_3/C_1 = \underbrace{(C_3/C_2)}_{h_2} \underbrace{(C_2/C_1)}_{h_1}$$

$$D = (h_1 + p_1) + (H/h_1 + p_2)$$

$$\text{To minimize delay: } \frac{\partial D}{\partial h_1} = 0 = (1 - H/h_1^2)$$

$$h_1^2 = H = h_1 h_2 \Rightarrow h_1 = h_2$$

# Logical effort: NAND2 and NOR2



(a) NAND2

*each input:*  $C_{in} = C_{Gn}(2 + r)$

$$g_{NAND2} = \frac{C_{in}}{C_{ref}} = \frac{2 + r}{1 + r}$$

$$g_{NAND\_n} = \frac{n + r}{1 + r}$$



(b) NOR2

*each input:*  $C_{in} = C_{Gn}(1 + 2r)$

$$g_{NOR2} = \frac{C_{in}}{C_{ref}} = \frac{1 + 2r}{1 + r}$$

$$g_{NOR\_n} = \frac{1 + nr}{1 + r}$$

# Delay of a complex logic gate



Design the gate with the resistance =  $R_{ref}$

$$\begin{aligned} d_{abs} &= kR_{ref}(C_{FET} + C_{out}) \\ &= kR_{ref}C_{ref}\underbrace{\left(\frac{C_{FET}}{C_{ref}}\right)}_p + kR_{ref}C_{ref}\underbrace{\left(\frac{C_{in}}{C_{ref}}\right)}_g\underbrace{\left(\frac{C_{out}}{C_{in}}\right)}_h \end{aligned}$$

# Delay of a complex logic gate



$$d_{abs} = kR_{ref} (C_{FET} + C_{out})$$

$$d = \frac{d_{abs}}{\tau_r} = \left( \frac{C_{FET}}{C_{ref}} \right) + \left( \frac{C_{in}}{C_{ref}} \right) \left( \frac{C_{out}}{C_{in}} \right) = p + gh$$

# Delay of a complex logic gate with bigger size (Size factor = S)



$$d_{abs} = k \frac{R_{ref}}{S} (SC_{FET} + C_{out})$$

$$d = \frac{d_{abs}}{\tau_r} = \left( \frac{C_{FET}}{C_{ref}} \right) + \left( \frac{C_{in}}{C_{ref}} \right) \left( \frac{C_{out}}{SC_{in}} \right) = p + gh$$

Effort delay = *gh*

# Logical effort: summary

- Inverter has the smallest logical effort and intrinsic delay of all static CMOS gates
- Logical effort of a gate presents the ratio of its input capacitance to the inverter capacitance when sized to deliver the same current
- Logical effort increases with the gate complexity
- Logical effort depends only on the type of gate and the  $W_p/W_n$  ratio of the reference inverter (technology constant).
- Size of the gate comes into picture in electrical effort.

# Logical effort: summary



$$g_{NOT} = 1$$



$$g_{NAND2} = \frac{2+r}{1+r} = \frac{4}{3}$$



$$g_{NOR2} = \frac{1+2r}{1+r} = \frac{5}{3}$$

# Logical effort of complex gates



# Path logical effort

$$\text{stage delay} = d_i = g_i h_i + p_i$$

$$\text{Total path delay} = D = \sum_{i=1,\dots,N} d_i = \sum_{i=1,\dots,N} g_i h_i + p_i$$

$$\text{Path logical effort} = G = \prod_{i=1,\dots,N} g_i$$

$$\text{Path electrical effort} = H = \prod_{i=1,\dots,N} h_i = \frac{C_{last}}{C_{first}}$$

$$\text{Path effort} = F = GH = (g_1 h_1)(g_2 h_2) \dots (g_N h_N) = f_1 f_2 \dots f_N$$

# Minimum path delay

*For minimum delay:  $f_1 = f_2 = \dots = f_N = f = F^{1/N} = (GH)^{1/N}$*

*Upsizing factor for each stage  $h_i$*

$$g_i h_i = F^{1/N} \Rightarrow h_i = \frac{F^{1/N}}{g_i}$$

*Optimum Path Delay =  $D_{opt} = NF^{1/N} + \sum_{i=1,\dots,N} p_i = NF^{1/N} + P$*

**Key: Make all the stage delays to be equal**

## Example: optimize logic path



*Path Logical effort = G = ?*

*Path electrical effort = H = ?*

*Path effort = F = GH*

*stage effort = f = (GH)<sup>1/N</sup>*

*a =*

*b =*

*c =*

## Example: optimize logic path



### **Gate logical effort**

**NOT Gate:**  $g_1 = 1, g_4 = 1,$

**3-Input NAND:**  $g_2 = (3+r)/(1+r) = 5/3$

**2-input NOR:**  $g_3 = (1+2r)/(1+r) = 5/3$

## Example: optimize logic path



$$\text{Path Logical effort} = G = 1(5/3)(5/3)1 = 25/9$$

$$\text{Path electrical effort} = H = 5/1 = 5$$

$$\text{Path effort} = F = GH = 5(25/9) = 125/9$$

$$\text{stage effort} = f = (GH)^{1/N} = (125/9)^{1/4} = 1.93$$

$$a = 1.93/g_1 = 1.93$$

$$b = (f/g_2) \times a = 2.23$$

$$c = (f/g_3) \times b = 3*(1.93/5)*2.23 = 2.59$$

$$\text{or, } 5/c = (f/g_4) \Rightarrow c = 5g_4/f = 5/1.93 = 2.59$$

# Branching effort



$$\text{Path branching effort} = B = \prod_{i=1,\dots,N} b_i$$

**Path effort =  $F = GHB$**   
= **Logical effort x Elec. effort x Branch effort**

# Example: Branching effort



$$b_1 = \frac{C_{NAND2} + C_{NOR}}{C_{NAND2}} = \frac{(2+r) + (1+2r)}{(2+r)} = \frac{3(1+r)}{(2+r)} = \frac{9}{4}$$

$$b_2 = \frac{C_{NOT} + C_{NOR}}{C_{NOT}} = \frac{(1+r) + (1+2r)}{(1+r)} = \frac{2+3r}{(1+r)} = \frac{8}{4}$$

$$B = b_1 b_2 = \frac{9}{4} \cdot \frac{8}{4} = \frac{9}{2}$$

# Delay minimization with branching effort



$$F = GH = 1.4.4.32 \Rightarrow f = 8$$

$$f = b_1 g_1 h_1 = 4h_1 = 8 \Rightarrow h_1 = C_{g2}/C_{g1} = 2 \Rightarrow C_{g2} = 2C_{g1}$$

$$f = b_2 g_2 h_2 = 4h_2 = 8 \Rightarrow h_2 = C_{g3}/C_{g2} \Rightarrow C_{g3} = 2C_{g2} = 4C_{g1}$$

$$f = b_3 g_3 h_3 = 8 \Rightarrow h_3 = C_L/C_{g3} \Rightarrow C_{g3} = 32C_{g1}/8 = 4C_{g1}$$

$$\frac{4C_{g2}}{C_{g1}} = \frac{4C_{g3}}{C_{g2}} = \frac{C_L}{C_{g3}} = 8$$

# Optimizing number of stages

- We know from the inverter chain example that the optimal number of stages depends on  $C_{last}/C_{first} = \text{Path Electrical Effort } H$
- For chain of gates the optimal number of stages depends on the total Path Effort =  $F = GBH$

# Optimizing number of stages

- Inverters do not add to the path effort  $F$  as
  - ‘ $g_{NOT} = 1$  and  $H = C_{last}/C_{first}$  is independent of ‘ $h_i$ ’
- We can add an arbitrary number of inverters to a path without affecting the path effort



- The total delay is given by:

$$D = NF^{1/N} + \sum_{i=1, \dots, n1} p_i + (N - n_1)p_{inv}$$

# Optimizing number of stages

- For minimum delay:

$$\frac{\partial D}{\partial N} = F^{1/N} - F^{1/N} \ln(F^{1/N}) + p_{inv} = 0$$

- Let best value of N is  $N_{opt}$

- Assume,

$$\rho = F^{1/N_{opt}} = \text{ideal stage effort}$$

- Therefore, we have:

$$\rho(1 - \ln(\rho)) + p_{inv} = 0$$

- For  $p_{inv} = 0$ ,  $\rho = e$

- For  $p_{inv} = 1$ ,  $\rho = 3.6$

- $N_{opt} = \ln(F)/\ln(\rho)$

# Class on Oct. 13, 2015

- Combinational logic styles
  - Pass transistor
  - Transmission gate
  - Dynamic logic
- Power dissipation in CMOS circuits
- Sequential elements