



**EECS 151/251A**  
**Spring 2021**  
**Digital Design and**  
**Integrated Circuits**

Instructor:  
Wawrzynek

Lecture 15 - Exam1 Review

# Announcements

- Virtual Front Row for today 3/9:
  - James Shi
  - Parth Nobel
  - Zitao Fang
  - Daniel Guzman
- More questions/comments please!
- ***No class Tuesday. Exam 6-9PM***
- Homework #6 assignment solutions posted today - part of exam 1.
- No problem set due Monday.

# **Exam 1 Details - Thur 3/11 6-9PM**

- ❑ Open-book, notes, on-line references.
  - ❑ *Can't talk or communicate about the exam with anyone during the exam except for course staff.*
- ❑ Need to apply concepts from class and homework
  - ❑ you'll need to figure out what concept to apply and how
- ❑ Homework problems a good indicator of topic priority
- ❑ Previous exams posted, to understand my style
- ❑ zoom proctoring
  - ❑ answer your questions
  - ❑ help keep you honest
  - ❑ check piazza for details of procedure
- ❑ Remember, course *not curved*. (You do not need to cheat to be competitive)

# *Review with sample slides*

- Do not study only the following slides.  
These are just representative of what you need to know.
- Go back and study the entire lecture.
- A list of important topics (more comprehensive than this set of slides) is posted on website:
  - “exam1-information.pdf”

# *Moore's Law – 2x transistors per 1-2 yr*



# Dennard Scaling

Things we do: scale dimensions, doping, Vdd.



What we get:  
 $\kappa^2$  as many transistors  
at the same power density!

Whose gates switch  $\kappa$  times faster!

TABLE I  
SCALING RESULTS FOR CIRCUIT PERFORMANCE

| Device or Circuit Parameter                  | Scaling Factor |
|----------------------------------------------|----------------|
| Device dimension $t_{\text{ox}}$ , $L$ , $W$ | $1/\kappa$     |
| Doping concentration $N_a$                   | $\kappa$       |
| Voltage $V$                                  | $1/\kappa$     |
| Current $I$                                  | $1/\kappa$     |
| Capacitance $\epsilon A/t$                   | $1/\kappa$     |
| Delay time/circuit $VC/I$                    | $1/\kappa$     |
| Power dissipation/circuit $VI$               | $1/\kappa^2$   |
| Power density $VI/A$                         | 1              |

Power density scaling ended in 2003  
(Pentium 4: 3.2GHz, 82W, 55M FETs).

# *Design Space & Optimality*



# Cost

- **Non-recurring** engineering (NRE) costs
- Cost to develop a design (product)
  - Amortized over all units shipped
  - E.g. \$20M in development adds \$.20 to each of 100M units
- **Recurring** costs
  - Cost to manufacture, test and package a unit
  - Processed wafer cost is ~10k (around 16nm node) which yields:
    - 50-100 large FPGAs or GPUs
    - 200 laptop CPUs
    - >1000 cell phone SoCs

$$\text{cost per IC} = \text{variable cost per IC} + \frac{\text{fixed cost}}{\text{volume}}$$

$$\text{variable cost} = \frac{\text{cost of die} + \text{cost of die test} + \text{cost of packaging}}{\text{final test yield}}$$

# *Relationship Among Representations*

- \* Theorem: Any Boolean function that can be expressed as a truth table can be written as an expression in Boolean Algebra using AND, OR, NOT.



*How do we convert from one to the other?*

# Inverter Example of Restoration

Example (look at 1-input gate, to keep it simple):



Idealize Inverter



Actual Inverter

- Inverter acts like a “non-linear” amplifier
- The non-linearity is critical to restoration
- Other logic gates act similarly with respect to input/output relationship.

# Register Transfer Level Abstraction (RTL)

Any synchronous digital circuit can be represented with:

- Combinational Logic Blocks (CL), plus
- State Elements (registers or memories)



- State elements are mixed in with CL blocks to control the flow of data.
- Sometimes used in large groups by themselves for “long-term” data storage.

# Implementation Alternative Summary

|                               |                                                                                                 |
|-------------------------------|-------------------------------------------------------------------------------------------------|
| Full-custom:                  | All circuits/transistors layouts optimized for application.                                     |
| Standard-cell:                | Small function blocks/"cells" (gates, FFs) automatically placed and routed.                     |
| Gate-array (structured ASIC): | Partially prefabricated wafers with arrays of transistors customized with metal layers or vias. |
| FPGA:                         | Prefabricated chips customized with loadable latches or fuses.                                  |
| Microprocessor:               | Instruction set interpreter customized through software.                                        |
| Domain Specific Processor:    | Special instruction set interpreters (ex: DSP, NP, GPU).                                        |

These days, “ASIC” almost always means Standard-cell.

What are the important metrics of comparison?

# FPGA versus ASIC



- **ASIC:** Higher NRE costs (10's of \$M). Relatively Low cost per die (10's of \$ or less).
- **FPGAs:** Low NRE costs. Relatively low silicon efficiency ⇒ high cost per part (> 10's of \$ to 1000's of \$).
- **Cross-over volume** from cost effective FPGA design to ASIC was often in the 100K range.

# Hardware Description Languages

- Basic Idea:

- Language constructs describe circuits with two basic forms:
- **Structural descriptions:** connections of components. Nearly one-to-one correspondence to with schematic diagram.
- **Behavioral descriptions:** use high-level constructs (similar to conventional programming) to describe the circuit function.

- Originally invented for simulation.

- “logic synthesis” tools exist to automatically convert to gate level representation.
- High-level constructs greatly improves designer productivity.
- However, this may lead you to falsely believe that hardware design can be reduced to writing programs\*

“Structural” example:

```
Decoder(output x0,x1,x2,x3;  
        inputs a,b)  
{  
    wire abar, bbar;  
    inv(bbar, b);  
    inv(abar, a);  
    and(x0, abar, bbar);  
    and(x1, abar, b );  
    and(x2, a,      bbar);  
    and(x3, a,      b );  
}
```

“Behavioral” example:

```
Decoder(output x0,x1,x2,x3;  
        inputs a,b)  
{  
    case [a b]  
        00: [x0 x1 x2 x3] = 0x8;  
        01: [x0 x1 x2 x3] = 0x4;  
        10: [x0 x1 x2 x3] = 0x2;  
        11: [x0 x1 x2 x3] = 0x1;  
    endcase  
}
```

Warning: this is a fake HDL!

\*Describing hardware with a language is similar, however, to writing a parallel program.

# Review - Ripple Adder Example

```
module FullAdder(a, b, ci, r, co);
    input a, b, ci;
    output r, co;

    assign r = a ^ b ^ ci;
    assign co = a&ci + a&b + b&ci;

endmodule
```



```
module Adder(A, B, R);
    input [3:0] A;
    input [3:0] B;
    output [4:0] R;

    wire c1, c2, c3;
    FullAdder
    add0(.a(A[0]), .b(B[0]), .ci(1'b0), .co(c1), .r(R[0]));
    add1(.a(A[1]), .b(B[1]), .ci(c1), .co(c2), .r(R[1]));
    add2(.a(A[2]), .b(B[2]), .ci(c2), .co(c3), .r(R[2]));
    add3(.a(A[3]), .b(B[3]), .ci(c3), .co(R[4]), .r(R[3]));

endmodule
```

A block diagram of a 4-bit Ripple Adder. It consists of four Full Adder (FA) modules connected in series. The first FA takes inputs *a<sub>3</sub>* and *b<sub>3</sub>*, and its carry output *c<sub>3</sub>* is connected to the second FA's carry input. The second FA takes inputs *a<sub>2</sub>* and *b<sub>2</sub>*, and its carry output *c<sub>2</sub>* is connected to the third FA's carry input. The third FA takes inputs *a<sub>1</sub>* and *b<sub>1</sub>*, and its carry output *c<sub>1</sub>* is connected to the fourth FA's carry input. The fourth FA takes inputs *a<sub>0</sub>* and *b<sub>0</sub>*, and its carry output *c<sub>0</sub>* is connected to the output *R*.

# Example - Ripple Adder Generator

Parameters give us a way to generalize our designs. A module becomes a “generator” for different variations. Enables design/module reuse. Can simplify testing.

```
module Adder(A, B, R);  
    parameter N = 4;  
    input [N-1:0] A;  
    input [N-1:0] B;  
    output [N:0] R;  
    wire [N:0] C;  
  
    genvar i;  
  
    generate  
        for (i=0; i<N; i=i+1) begin:bit  
            FullAdder add(.a(A[i]), .b(B[i]), .ci(C[i]), .co(C[i+1]), .r(R[i]));  
        end  
    endgenerate  
  
    assign C[0] = 1'b0;  
    assign R[N] = C[N];  
endmodule
```

*Declare a parameter with default value.*

*Note: this is not a port. Acts like a “synthesis-time” constant.*

*Replace all occurrences of “4” with “N”.*

*variable exists only in the specification - not in the final circuit.*

*Keyword that denotes synthesis-time operations*

*For-loop creates instances (with unique names)*

```
Adder adder4 ( ... );  
Adder #(N(64)) adder64 ( ... );
```

*Overwrite parameter N at instantiation.*

# EECS151 Registers

- All registers are “N” bits wide - the value of N is specified at instantiation
- All positive edge triggered.



```
module REGISTER(q, d, clk);  
  parameter N = 1;
```



```
module REGISTER_CE(q, d, ce, clk);  
  parameter N = 1;
```

*On the rising clock edge if clock enable (ce) is 0 then the register is disabled (it's state will not be changed).*



```
module REGISTER_R(q, d, rst, clk);  
  parameter N = 1;  
  parameter INIT = 1b'0;
```

*On the rising clock edge if reset (rst) is 1 then the state is set to the value of INIT. Default INIT value is all 0's.*



```
module REGISTER_R_CE(q, d, rst, ce, clk);  
  parameter N = 1;  
  parameter INIT = 1b'0;
```

*Reset (rst) has priority over clock enable (ce).*

# 4-bit wrap-around counter

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,  
11, 12, 13, 14, 15, 0, 1, ...



```
module counter(value, enable, reset, clk);
    output [3:0] value;
    input      enable, reset, clk;
    wire [3:0]  next;
    REGISTER_R #(4) state (.q(value), .d(next), .rst(reset),
    assign next = value + 1;
endmodule // counter
```

# FPGA Overview

- Basic idea: two-dimensional array of logic blocks and flip-flops with a means for the user to configure (program):
  1. the interconnection between the logic blocks,
  2. the function of each block.



*Simplified version of FPGA internal architecture*

# User Programmability

- Latch-based (Xilinx, Intel/Altera, ...)



- Latches are used to:

1. control a switch to make or break cross-point connections in the interconnect
2. define the function of the logic blocks
3. set user options:
  - within the logic blocks
  - in the input/output blocks
  - global reset/clock

- “Configuration bit stream” is loaded under user control

# 4-LUT Implementation



- n-bit LUT is implemented as a  $2^n \times 1$  memory:
  - inputs choose one of  $2^n$  memory locations.
  - memory locations (latches) are normally loaded with values from user's configuration bit stream.
  - Inputs to mux control are the CLB inputs.
- Result is a general purpose “logic gate”.
  - n-LUT can implement any function of n inputs!

# Example Partition, Placement, and Route



## Example Circuit:

- collection of gates and flip-flops



Two partitions. Each has single output, no more than 4 inputs, and no more than 1 flip-flop. In this case, inverter goes in both partitions.

Note: the partition can be arbitrarily large as long as it has not more than 4 inputs and 1 output, and no more than 1 flip-flop.

# *Some Laws of Boolean Algebra*

Duality: A dual of a Boolean expression is derived by interchanging OR and AND operations, and 0s and 1s (literals are left unchanged).

$$\{F(x_1, x_2, \dots, x_n, 0, 1, +, \bullet)\}^D = \{F(x_1, x_2, \dots, x_n, 1, 0, \bullet, +)\}$$

Any law that is true for an expression is also true for its dual.

Operations with 0 and 1:

$$\begin{array}{ll} x + 0 = x & x * 1 = x \\ x + 1 = 1 & x * 0 = 0 \end{array}$$

Idempotent Law:

$$x + x = x \quad x \cdot x = x$$

Involution Law:

$$(x')' = x$$

Laws of Complementarity:

$$x + x' = 1 \quad x \cdot x' = 0$$

Commutative Law:

$$x + y = y + x \quad x \cdot y = y \cdot x$$

# *Algebraic Simplification*

$$\begin{aligned} \text{Cout} &= a'bc + ab'c + abc' + abc \\ &= a'bc + ab'c + abc' + \textcolor{red}{abc} + abc \\ &= a'bc + \textcolor{red}{abc} + ab'c + abc' + \textcolor{red}{abc} \\ &= [\textcolor{red}{a'} + a]bc + ab'c + abc' + abc \\ &= [\textcolor{red}{1}]bc + ab'c + abc' + abc \\ &= bc + ab'c + abc' + \textcolor{red}{abc} + abc \\ &= bc + ab'c + \textcolor{red}{abc} + \textcolor{red}{abc}' + \textcolor{red}{abc} \\ &= bc + \textcolor{red}{a(b' + b)c} + abc' + abc \\ &= bc + \textcolor{red}{a[1]c} + abc' + abc \\ &= bc + ac + ab[c' + c] \\ &= bc + ac + ab[\textcolor{red}{1}] \\ &= bc + ac + ab \end{aligned}$$

# Canonical Forms

- Standard form for a Boolean expression - unique algebraic expression directly from a true table (TT) description.
- Two Types:
  - \* **Sum of Products (SOP)**
  - \* **Product of Sums (POS)**
- Sum of Products (disjunctive normal form, minterm expansion). Example:

| Minterms | a | b | c | f | f' |
|----------|---|---|---|---|----|
| $a'b'c'$ | 0 | 0 | 0 | 0 | 1  |
| $a'b'c'$ | 0 | 0 | 1 | 0 | 1  |
| $a'bc'$  | 0 | 1 | 0 | 0 | 1  |
| $a'bc$   | 0 | 1 | 1 | 1 | 0  |
| $ab'c'$  | 1 | 0 | 0 | 1 | 0  |
| $ab'c$   | 1 | 0 | 1 | 1 | 0  |
| $abc'$   | 1 | 1 | 0 | 1 | 0  |
| $abc$    | 1 | 1 | 1 | 1 | 0  |

One product (and) term for each 1 in f:  
 $f = a'b'c + ab'c' + ab'c + abc' + abc$   
 $f' = a'b'c' + a'b'c + a'bc'$

*What is the cost?*

# Karnaugh Map Method

- Adjacent groups of 1's represent product terms

|   |   |   |   |
|---|---|---|---|
| a | 0 | 1 |   |
| b | 0 | 0 | 1 |
| 1 | 0 | 1 | 1 |

$$f = a$$

|   |   |   |   |
|---|---|---|---|
| a | 0 | 1 |   |
| b | 0 | 1 | 1 |
| 1 | 0 | 0 | 0 |

$$g = b'$$

|    |    |    |    |    |   |
|----|----|----|----|----|---|
| ab | 00 | 01 | 11 | 10 |   |
| c  | 0  | 0  | 0  | 1  | 0 |
| 1  | 0  | 1  | 1  | 1  | 1 |

$$\text{cout} = ab + bc + ac$$

|    |    |    |    |    |   |
|----|----|----|----|----|---|
| ab | 00 | 01 | 11 | 10 |   |
| c  | 0  | 0  | 0  | 1  | 1 |
| 1  | 0  | 0  | 1  | 1  | 1 |

$$f = a$$

# Multi-level Combinational Logic

Another Example:  $F = abc + abd + a'c'd' + b'c'd'$



$$\text{let } x = ab \quad y = c+d$$

$$f = xy + x'y'$$



Incorporates fanout.

No convenient hand methods exist for multi-level logic simplification:

- a) CAD Tools use sophisticated algorithms and heuristics  
Guess what? These problems tend to be NP-complete
- b) Humans and tools often exploit some special structure (example adder)

# **NAND-NAND & NOR-NOR Networks**

- Mapping from AND/OR to NAND/NAND



# Finite State Machines (FSMs)

- **FSM** circuits are a type of *sequential circuit*:
  - output depends on present *and* past inputs
    - effect of past inputs is represented by the current **state**
- Behavior is represented by **State Transition Diagram**:
  - traverse one edge per clock cycle.



# Formal Design Process (3,4)

State Transition Table:

| present state | OUT | IN | next state |
|---------------|-----|----|------------|
| EVEN          | 0   | 0  | EVEN       |
| EVEN          | 0   | 1  | ODD        |
| ODD           | 1   | 0  | ODD        |
| ODD           | 1   | 1  | EVEN       |

Invent a code to represent states:

Let 0 = EVEN state, 1 = ODD state

| present state (ps) | OUT | IN | next state (ns) |
|--------------------|-----|----|-----------------|
| 0                  | 0   | 0  | 0               |
| 0                  | 0   | 1  | 1               |
| 1                  | 1   | 0  | 1               |
| 1                  | 1   | 1  | 0               |



Derive logic equations from table (how?):

$$OUT = PS$$

$$NS = PS \text{ xor } IN$$

# FSM CL block rewritten

```
always @* * for sensitivity list
begin
    next_state = IDLE;
    out = 1'b0;
    case (state)
        IDLE : if (in == 1'b1) next_state = S0;
        S0   : if (in == 1'b1) next_state = S1;
        S1   : begin
            out = 1'b1;
            if (in == 1'b1) next_state = S1;
        end
        default: ;
    endcase
end
Endmodule
```

**Normal values: used unless specified below.**



**Within case only need to specify exceptions to the normal values.**

**Note: The use of “blocking assignments” allow signal values to be “rewritten”, simplifying the specification.**

# FSM Recap

## Moore Machine



## Mealy Machine



*Both machine types allow one-hot implementations.*

# One-hot encoded combination lock



# Final product ...

Vd



Vs

"The planar process"

Jean Hoerni,  
Fairchild  
Semiconductor  
1958

Top-down view:



# Physical Layout



- ▶ How do transistor circuits get “laid out” as geometry?
- ▶ What circuit does a physical layout implement?
- ▶ Where are the transistors and wires and contacts and vias?



MOSFET scaling  
(process nodes)

10  $\mu\text{m}$  – 1971  
6  $\mu\text{m}$  – 1974  
3  $\mu\text{m}$  – 1977  
1.5  $\mu\text{m}$  – 1981  
1  $\mu\text{m}$  – 1984  
800 nm – 1987  
600 nm – 1990  
350 nm – 1993  
250 nm – 1996  
180 nm – 1999  
130 nm – 2001  
90 nm – 2003  
65 nm – 2005  
45 nm – 2007  
32 nm – 2009  
22 nm – 2012  
14 nm – 2014  
10 nm – 2016  
7 nm – 2018  
5 nm – 2020

Future  
3 nm – ~2022  
2 nm – >2023

# State of the art

\* From Wikipedia

## ► 7nm

As of September 2018, mass production of 7 nm devices has begun. The first mainstream 7 nm mobile processor intended for mass market use, the [Apple A12 Bionic](#), was released at their September 2018 event. Although [Huawei](#) announced its own 7 nm processor before the Apple A12 Bionic, the Kirin 980 on August 31, 2018, the [Apple A12 Bionic](#) was released for public, mass market use to consumers before the Kirin 980. Both chips are manufactured by [TSMC](#). On July 7, 2019, AMD officially launched their [Ryzen](#) 3000 series of central processing units, based on the TSMC 7 nm process and [Zen 2](#) microarchitecture.

## ► 5nm

In October 2019, TSMC started sampling 5nm [A14 processors for Apple](#). In December 2019, TSMC announced an average yield of ~80%, with a peak yield per wafer of >90% for their 5nm test chips with a die size of 17.92 mm<sup>2</sup>. In mid 2020 TSMC claimed its (N5) 5nm process offered 1.8x the density of its 7nm N7 process, with 15% speed improvement or 30% lower power consumption; an improved sub-version (N5P) was claimed to improve on N5 with +5% speed or -10% power.<sup>[19]</sup>

On October 13, 2020, Apple announced a new [iPhone 12](#) lineup using the [A14](#), together with the [Huawei Mate 40](#) lineup using the [HiSilicon Kirin 9000](#), which were the first devices to be commercialized on TSMC's 5nm node. Later, on November 10, 2020, Apple also revealed three new Mac models using the [Apple M1](#), another 5nm chip. According to Semianalysis, the A14 processor has a transistor density of 134 million transistors per mm<sup>2</sup>.

## ► 3nm

As of 2019, [Intel](#), [Samsung](#), and [TSMC](#) have all announced plans to put a 3 nm semiconductor node into commercial production. Samsung's 3 nm process is based on [GAAFET](#) (gate-all-around field-effect transistor) technology, a type of [multi-gate MOSFET](#) technology, while TSMC's 3nm process will still use [FinFET](#) (fin field-effect transistor) technology,<sup>[1]</sup> despite TSMC developing GAAFET transistors.

# Complex CMOS Gate

$$OUT = \overline{D + A \cdot (B + C)}$$

$$OUT = \overline{D \cdot A + B \cdot C}$$



# 4-to-1 Transmission-gate Mux



- The series connection of pass-transistors in each branch effectively forms the AND of  $s_1$  and  $s_0$  (or their complement).
- Compare cost to logic gate implementation

*Any better solutions?*

# Tri-state Buffers

Tri-state Buffer:



| OE | IN | OUT |
|----|----|-----|
| 0  | 0  | Z   |
| 0  | 1  | Z   |
| 1  | 0  | 0   |
| 1  | 1  | 1   |

"high  
impedance" (output  
disconnected)

Variations:



| OE | IN | OUT |
|----|----|-----|
| 0  | -  | Z   |
| 1  | 0  | 1   |
| 1  | 1  | 0   |

Inverting buffer



| OE | IN | OUT |
|----|----|-----|
| 0  | 0  | 0   |
| 0  | 1  | 1   |
| 1  | -  | Z   |

Inverted enable



transmission gate  
useful in  
implementation



# *Latches and Flip-flops*

## Positive Level-sensitive *latch*:



Positive Edge-triggered **flip-flop**  
built from two level-sensitive  
latches:



## Latch Implementation:



# Example



Parallel to serial  
converter circuit

$$T \geq \text{time}(\text{clk} \rightarrow \text{Q}) + \text{time}(\text{mux}) + \text{time}(\text{setup})$$

$$T \geq \tau_{\text{clk} \rightarrow \text{Q}} + \tau_{\text{mux}} + \tau_{\text{setup}}$$

# Inverter with Load Capacitance

$$t_p = 0.69(R_N/W)(C_{int} + C_L)$$



$$= 0.69(R_N/W)(3W\gamma C_G + C_L)$$

$$= 0.69(3\gamma R_N C_G) \left(1 + \frac{C_L}{\gamma C_{in}}\right)$$

$$= t_{inv}(1 + \frac{C_L}{\gamma C_{in}}) = t_{p0}(1 + f/\gamma)$$

$f = \text{fanout} = \text{ratio between load and input capacitance of gate}$

# Wire Delay

- Even in those cases where the transmission line effect is negligible:

- Wires posses distributed resistance and capacitance



- Time constant associated with distributed RC is proportional to the square of the length

- For **short wires** on ICs, resistance is insignificant (relative to effective R of transistors), but C is important.
  - Typically around half of C of gate load is in the wires.
- For **long wires** on ICs:
  - busses, clock lines, global control signal, etc.
  - Resistance is significant, therefore distributed RC effect dominates.
  - signals are typically “rebuffed” to reduce delay:



# How to retime logic

Critical path is 5.  
We want to improve  
it without changing  
circuit semantics.

Circles are combinational  
logic, labelled with delays.



Figure 1: A small graph before retiming. The nodes represent logic delays, with the inputs and outputs passing through mandatory, fixed registers. The critical path is 5.

Add a register, move  
one circle.  
Performance  
improves by 20%.



Figure 2: The example in Figure 1 after retiming. The critical path is reduced from 5 to 4.

Logic Synthesis tools can do this in  
simple cases.

# Gate Driving long wire and other gates



$$R_w = r_w L, \quad C_w = c_w L$$

$$\begin{aligned} t_p &= 0.69R_{dr}C_{int} + 0.69R_{dr}C_w + 0.38R_wC_w + 0.69R_{dr}C_{fan} + 0.69R_wC_{fan} \\ &= 0.69R_{dr}(C_{int} + C_{fan}) + 0.69(R_{dr}c_w + r_wC_{fan})L + 0.38r_wc_wL^2 \end{aligned}$$

# Driving Large Loads

- ▶ Large fanout nets: clocks, resets, memory bit lines, off-chip
- ▶ Relatively small driver results in long rise time (and thus large gate delay)



Staged Buffers



- ▶ How to optimally scale drivers?
- ▶ Optimal trade-off between delay per stage and total number of stages?

# 2-input NAND Gate



□ Solve:

$$C_{int} = (6/2)C_D + (3/2)C_D = (9/2)C_D$$

$$\begin{aligned}
 t_p &= 0.69 \cdot 2 \left( \frac{2R_N}{3W} \right) (C_{int} + C_L) \\
 &= 0.69 \left( \frac{4R_N}{3W} \right) \left( \frac{9}{2}\gamma WC_G + C_L \right) \\
 &= 0.69 \left( \frac{R_N}{W} \right) \left( 6\gamma WC_G + \frac{4}{3}C_L \right) \\
 &= 0.69 \left( \frac{R_N}{W} \right) 3\gamma WC_G \left( 2 + \frac{4C_L}{3\gamma (3WC_G)} \right) \\
 &= [0.69 \cdot 3R_N\gamma C_G] \left( 2 + \frac{4C_L}{3\gamma C_{IN}} \right) \\
 &= t_{p0} \left( 2 + \frac{4f}{3\gamma} \right)
 \end{aligned}$$