

## Advantage of CPLD

- Low capacitance
- High speed operation.

Drawbacks:

Voltage drop problem can be solved by using the following methods.

i) Level restoration

ii) Multiple threshold transistor

iii) Transmission gate logic.

## VLSI and chip design

### ASSIGNMENT

#### 1. Dynamic Latches and Registers

Introduction:

\* Dynamic circuits based on temporary storage of charge on parasitic capacitors.

\* Charge stored on a capacitor can be used represent a logic signal.

\* The absence of charge denotes '0', while its presence stands for a stored '1'.

\* In capacitor leakage is always present, so the stored value can hence only be kept for a limit amount of time (milliseconds).

Types:

i. Dynamic Transmission - Gate Edge-triggered Registers

ii. C<sup>2</sup> MOS - A clock skew insensitive Approach

iii. True single - phase clocked Register (TSPCR)

## 1. Dynamic transmission-gate Edge-triggered Registers.



\* When  $C1K=0$ , the transmission gate  $T_1$  turns ON, which has capacitance  $C_1$  and the junction capacitance of  $T_1$ , and the overlap gate capacitance of  $T_1$ .

\* During this period, the slave stage ( $T_2$ ) is in a hold mode, with node 2 in high impedance (floating) state.

\* When  $C1K=1$ , the transmission gate  $T_2$  turns on and the node 1 value propagates to the op a (node 1 is stable during the high phase of the clock when  $T_1$  is OFF).

\* Node 2 now stores the inverted version of node 1.

↳ Impact non-overlapping clocks



\* During the 0-0 overlap period, the PMOS of  $T_1$  are simultaneously ON, creating a direct path for data to flow from D-input of the register to the Q output. This is known as race condition.

\* The output Q can be change on the falling edge IF the overlap period is large.

\* During 1-1 overlap region, where an input-output path exists through the NMOS of  $T_1$  and NMOS of  $T_2$ . The later case is taken care off by enforcing

a hold time constraint. That is the data must be stable during the high-high overlap period.

\* constraint for the 0-0 overlap is given as:

$$t_{overlap\ 0-0} < t_{n} + t_{T_2} + t_{T_2}$$

\* constraint for the 1-1 overlap is given as:

$$t_{hold} > t_{overlap\ 1-1}$$

2. C<sup>2</sup>MOS - A clock-skew insensitive approach

↳ 1. C<sup>2</sup>MOS master-slave positive edge-triggered register



\* IF CLK=0, the first tri-state driver is ON and the master stage act as an inverter.

\* The master stage (M<sub>3</sub>-M<sub>4</sub>) is in the evaluation mode and the slave stage is in a hold mode.

\* Both transistors M<sub>1</sub> and M<sub>8</sub> are OFF, decoupling the output from the input. The output Q retains its previous value stored in CL<sub>1</sub> - CL<sub>2</sub>.

\* IF CLK=1, the master stage is hold mode (M<sub>3</sub>-M<sub>4</sub> OFF) slave stage evaluates (M<sub>7</sub>-M<sub>8</sub> ON).

\* The value stored on CL<sub>1</sub> propagates through the output node through the slave stage.

C<sup>2</sup>MOS avoids race conditions:



$C^2$ MOS avoids race condition by disconnecting direct path to the output, so this approach is insensitive to clock overlap.

↳  $C^2$ MOS based dual edge Registers



\* This sequential circuit samples the input on both clock edges.  
 \* It consists of two parallel master-slave based edge-triggered registers, whose outputs are multiplexed using the tri-state drivers.

\* When clock is high, the positive latch transistors M<sub>1</sub>-M<sub>4</sub> sample the inverted D input on node X. Node Y is held stable, since devices M<sub>9</sub> and M<sub>10</sub> are turned off.

\* When clock is low, the top slave latch M<sub>5</sub>-M<sub>8</sub> is ON, and drives the inverted value of X to the Q output.

\* When clock is low, transistors (M<sub>9</sub>, M<sub>10</sub>) are turned on, sampling the inverted D input on node Y.

\* On the rising edge, the bottom slave latch conducts and drives the inverted version of Y on node Q. Data hence changes on both sides.

### 3 True single-Phase clocked Register (TSPCR)



\*It uses single phase clock.

\*For the positive latch, when CLK is high, the latch is in the transparent mode. The latch is non inverting state and propagates the input to output.

\*When CLK=0, both inverters are disabled and the latch is hold mode. Pull-up networks are activated and the pull-down circuits are deactivated.

\*As a result, no signal can ever propagate from the input of the latch to the output in this mode.

### 2. Pipeling:

Pipelining is design technique used to accelerate the operation of the datapaths in digital processors.

Example [Reference circuit compute the  $\log(a+b)$ ]



\*Where both a and b represent streams of numbers.

\*Registers are edge triggered D registers.

\*The minimal clock period  $T_{min}$  is

$$T_{min} = t_{req} + t_{pd} + t_{setup}$$

- \* The delay is generally larger than the delays in registers and dominates the circuit performance
- \* If each module has an propagation delay, then each logic module is then active for only  $\frac{1}{3}$  of the clock period (If the delay of the register is ignored).
- \* For example, the adder unit is active during the first third of the period.
- \* Remains idle during other  $\frac{2}{3}$  of the period.

Pipeline circuit compute  $\log(10tb1)$



- \* Pipeling is a technique to improve the resource utilization and increase the functional throughput.
- \* Use registers between the logic blocks.
- \* The computation for one set of input data is spread over a number of clock periods.

Pipelined computation

| clock period | Adder       | Absolute value | Logarithm           |
|--------------|-------------|----------------|---------------------|
| 1            | $a_1 + b_1$ |                |                     |
| 2            | $a_2 + b_2$ | $ a_1 + b_1 $  |                     |
| 3            | $a_3 + b_3$ | $ a_2 + b_2 $  | $\log( a_1 + b_1 )$ |
| 4            | $a_4 + b_4$ | $ a_3 + b_3 $  | $\log( a_2 + b_2 )$ |
| 5            | $a_5 + b_5$ | $ a_4 + b_4 $  | $\log( a_3 + b_3 )$ |

- \* The result for the data set  $(a_1, b_1)$  only appears at the output after three clock periods.
- \* At that time circuit has already performs parts of the computation for the next data sets  $(a_2, b_2)$  and  $(a_3, b_3)$ .
- \* The advantage of pipelined operation is need maximum clock period.
- \* The combinational circuit block has been partitioned in to three sections, each of which has a smaller propagation delay than the original function.
- \* The minimum allowable clock period is

$$T_{\min \cdot \text{pipe}} = t_{c-q} + \max(t_{pd-\text{add}}, t_{pd-\text{abs}}, t_{pd-\text{log}})$$

#### 1. Latch vs Register based pipelines



- \* The pipeline system is implemented based on pass-transistor-based positive and negative latches instead of edge triggered registers.
- \* Logic is introduced between the master and slave latches of a master-slave system.
- \* When the clocks  $\text{clk}$  and  $\overline{\text{clk}}$  are non-overlapping, correct pipelined operation is obtained.
- \* Input data sampled on  $c_1$  at the positive edge of  $\text{clk}$  and the computation of logic block F starts and it stored on  $c_2$  on falling edge of  $\text{clk}$

and computation of logic block G starts.

\* The value stored on  $C_2$  at the end of the CLK low phase is the result of passing the previous input through the logic function F.

\* When overlap exists between CLK and  $\bar{CLK}$ , the next input already applied to F and its effect might propagate to  $C_2$  before  $\bar{CLK}$  goes low.

## 2. NORA - CMOS based pipelined structures.



\* The latch based pipeline circuit can also be implemented by using CMOS latches as shown above.

\* A CMOS based pipelined circuit is race free as long as all the logic functions F between the latches are non inverting.

\* During a (0-0) over-lap between CLK and  $\bar{CLK}$ , all CMOS latches, simplify to pure pull-up networks.



\* The only way a signal can race from stage to stage under this condition when the logic function F is inverting as shown above also it is illustrated below: Where F is replaced by a single, static CMOS inverter. Similar to consideration are

valid for the (1-1) overlap.



### NORA - CMOS

\* Based on this concept, a logic style - NORA-CMOS -

It combines CMOS pipeline registers and NORA dynamic logic function blocks.

\* Each module consists of a block of combinational logic that can be mixture of static and dynamic logic, followed by a CMOS latch.

\* Logic and latch are clocked in such a way that both simultaneously in either evaluation, or hold (pre-charge) mode.

\* A block that is in evaluation during CLK-1 is called a CLK-module, while the inverse is called a



\* A NORA datapath consists of a chain of alternating CLK and  $\overline{\text{CLK}}$  modules. While one class of modules is pre-charging with its output latch in hold mode, preserving the previous output value, the other class is

evaluating.

\* Data is passed in a pipeline fashion from module to module.

|       | CLK block |          | CLK block |          |
|-------|-----------|----------|-----------|----------|
|       | Logic     | Latch    | Logic     | Latch    |
| CLK=0 | Precharge | Hold     | Evaluate  | Evaluate |
| CLK=1 | Evaluate  | Evaluate | precharge | Hold     |

2 mark

1. Charge sharing problem in dynamic CMOS logic:

The charge sharing problem occurs when the charge which is stored at the output node in the pre-charge phase is shared among the junction capacitance of transistor during evaluation phase.

The charges sharing may degrade the output voltage or even cause an erroneous output value.

2. 4:1 mux using transmission gate:



### 3. Differentiate latches and FlipFlops

| Latches                                     | FlipFlops                                                       |
|---------------------------------------------|-----------------------------------------------------------------|
| Latches are level triggered devices         | FlipFlops are edge triggered devices.                           |
| Latches are faster as compared to FlipFlops | FlipFlops are slow compared to the latches.                     |
| Latches are constructed from logic gates    | FlipFlops are constructed from latches along with clock signals |

### 4. clock - to - Q delay

Flip-Flop propagation delay (or clock - to - Q delay): The time it takes for the FF output Q to be stable after the clock edge occurs. A value must be held stable during the decision.

### 5. setup time:

The setup time is the time that the data I/p (D) must be valid before the clock transition. The hold time ( $t_{hold}$ ) is the data i/p must remain valid the clock edge.

### 6. C<sup>2</sup>MOS Register:

A C<sup>2</sup>MOS (clocked CMOS) master - slave positive edge-triggered register is insensitive to clock overlaps because clock overlaps activate either the pull-up or pull down networks, but never both of them simultaneously.

### 7 Gilitch clock generation:



### 8 Clock skew:

The spatial time variation in arrival time of a clock transition is referred to as clock skew. Clock skew can be positive or negative depending upon the routing direction and position of the clock source.

Clock jitter refers to temporal variation of clock period at a given point - that is, the clock period can reduce or expand on a cycle-by-cycle basis.

### 9 Draw (a) static CMOS (b) Pseudo nMOS (c) dynamic inverters



(a)

(b)

(c)

### 10 Monotonicity problem:

While dynamic gate is in evaluation mode, the I/P must be monotonically rising. When the clock rises (evaluation mode), the I/P is high

so the output is discharged low through the pull down network. However, the pre-charged (PMOS) transistor is also OFF the O/P floats. This is called monotonicity problem.

The monotonicity problem can be solved by placing static CMOS inverter between dynamic gates. This converts the monotonically falling output into a monotonically rising signal suitable for the next stage CMOS gate.

### 11. Bistability principle:

Bistability principle states that, when the gain of the inverter in the transient region is larger than 1, there are two stable operating points (High and Low). Triggering is need change from one state to another.

Approaches for bistable circuit are

- i. cutting the feedback loop
- ii. overpowering the feedback loop

### 12. Principle of dynamic latch/ register

charge stored on the capacitor can be used to represent a logic signal is the principle of dynamic latch. The absence of the charge denotes '0', while its presence stands for a stored '1'.

### 13. Approaches used to create an edge triggered register:

- i. Master slave configuration
- ii. Gilitch approach pulse register
- iii. Sense amplifiers.

14. True single phase clocked register (TSPCR)

In the two phase docking schemes, care must be taken in routing the two clock signals to ensure that overlap is minimized. The true single phase clocked Register (TSPCR) uses a single clock. This reduces delay overhead and complexity.

15. Two properties of schmitt trigger:

- i. It response to a slowly changing input waveform with a fast transition time at the output.
- ii. The voltage transfer characteristic of the device displays different switching thresholds for positive & negative going input signals.

16. Clock gating:

A common method to reduce power in idle mode is the clock gating technique. In this approach, the main clock connection to a module is turned off (or gated) whenever the block is idle. However, clock gating does not reduce the leakage power of the idle block.

### 3 Dynamic Circuits:



Fig(2.1)

a) static CMOS

b) pseudo

c) dynamic

NMOS

CMOS inverter

$V_{DD}$



Fig(2.2) precharge and evaluation



Fig(2.3) Footed dynamic inverter



Fig(2.4)

a) Footed

b) unfooted



Fig (2.5) monotonicity problem

## 2.1. Domino logic



Fig 2.1.1 domino logic.

## 2.2 dual-rail domino logic



Fig 2.2 dual-rail domino logic

### 2.3 keepers



Fig 2.3 keepers

### 2.4. multiple output domino logic (MDDL)



Fig. 2.4 MDDL carry chain

### 2.5 NP and zipper domino



(a)



Fig 2.5 NP domino

#### 4. Pass transistor circuits

##### 2.4.1 CMOS with transmission gates



Fig 2.4.1. Transistor gate multiplexing



Fig 2.4.2 CMOS with transmission gates

##### 2.4.2 complementary pass transistor logic



Fig 2.4.2 CPOL

## Home Test

### 1. Binary Adder - circuit design consideration

#### 1.1 static CMOS adder circuit



28 Transistors

$$C_0 = AB + BC_i + AC_i$$

$$S = ABC_i + C_0 \{A + B + Ci\}$$

Implement the Full adder using logic equations and translate them into CMOS circuitry.

\* The complementary static circuitry has less transistor (28)

\* This circuit takes less latency in carry generation

\* This circuit implemented using OR and AND gates instead of XOR gate.

Advantages

\* Reduce logical effort

\* Functions at low power supply

\* Provide full swing output and reliability

Drawbacks

\* Propagation delay is high

\* consumes large area

\* Large PMOS transistor in pull-up NW gives high I/P capacitance causing extra delay and dynamic power.

## 1.2 Mirror Adder



24 Transistors

An improved adder circuit, also called mirror adder. Its operation is based on the below equation.

$$C_0(A, P) = A + P_C$$

$$S(A, P) = P \oplus C_1$$

The mirror adder

\* NMOS, PMOS chains - completely symmetrical. Maximum series two transistors can be observed in the carry-generation circuitry.

\* Laying out the cell, minimization of the capacitance at node  $C_0$  - most critical issue. Important - reduction of the diffusion capacitance.

\* Capacitance at node  $C_0$  - four diffusion capacitances, six gate capacitances in connecting adder cell.

\* Only one stage transistors optimized for optimal speed. All transistors sum stage can be minimal size.

Advantages

\* Reduction in delay and area

\* Reduced power consumption

\* Design procedure is easy

Drawbacks

\* Additional inverters required sum & carry Function.

### 1.3 Manchester carry chain adder

1.3.1 static manchester carry gate using propagate, generate and kill.

\* The propagate path is unchanged and  $C_i$  passed to go output if the propagate signal  $A_i \oplus X_i \oplus B_i$  is true.

\* Propagate condition not satisfied, o/p either pull low by  $D_i$  or pull up by  $G_{ii}$ .

1.3.2 Manchester carry chain in dynamic logic using propagate and generate

\* The transistors in dynamic circuits are monotonic

\* The transmission gate are replaced by nMOS pass transistors

\* o/p precharging eliminates the need of kill signal

1.3.3 Example in dynamic logic (4 bit section)



\* It uses cascade of pass transistors to implement the carry chain.

\* During precharge  $\phi = 0$ , all intermediate nodes of the pass transistor carry chain are precharged to VDD.

\* During evaluation the node  $A_k$  node is precharged.

when there is an incoming carry and propagate signal  $P_k$  is high or when the generate signal for stage  $k$  ( $G_{kk}$ ) is high.

\* The worst case delay is given by  $t_p = 0.69RC(N(N+1)/2)$

#### Advantages

\* Has fast carry & propagation O/P

\* It has better performance than static logic

\* Implemented using pass transistor which reduces total transistor count.

#### Drawbacks

\* Charge sharing and capacitive loading problem occurs.

\* It has threshold variation problem so level restoration circuits are needed.

## 2. Binary Adder - Logic design consideration

\* Ripple carry adder is used for the implementation of additions with a relatively small word length.

\* For large word length (multimedia processor requires word length up to (128 bits)) very fast computations is required)

#### Types:

1. Carry - Bypass Adder (carry skip Adder)

2. Linear carry - select Adder

3. Square - root carry - select adder

4. carry look ahead adder.

### 1. carry - Bypass Adder



idea: If ( $P_0$  and  $P_1$  and  $P_2$  and  $P_3 = 1$ )

then  $C_0,3 = C_0$ , else 'kill' or "generate".

\* When  $BP = P_0 P_1 P_2 P_3$  is 1 the very first incoming carry pass to bypass transistor, otherwise a carry is obtained via the normal route.

### Carry-Bypass Adder-path delay



M bits

\* For computing delay of an N bit adder, the total adder is divided into  $N/M$  equal length with M bits. The worst case delay for 16 bit carry bypass adder is given by

$$T_{\text{adder}} = t_{\text{setup}} + M t_{\text{carry}} + [N/M - 1] t_{\text{bypass}} + (M-1) t_{\text{carry}} + t_{\text{sum}}$$

$t_{\text{setup}} \rightarrow$  Fixed overhead time to create the generate and propagate signal

$t_{\text{carry}} \rightarrow$  propagation delay through single bit

$t_{\text{bypass}} \rightarrow$  propagation delay through the by-pass mux of a single stage

$t_{\text{sum}} \rightarrow$  Time to generate the sum of the signal

Propagation delay (carry ripple versus carry bypass)

\* Delay for ripple carry adder increases approximately for N bit.

\* Delay is linearly increases for Bypass adder

\* The cross over points depends upon technology considerations and situated between 4 and 8 bits



## Advantages

- \* Less delay than ripple carry adder

- \* Allow carry to skip over groups of M bits

## Drawbacks

- \* Area overhead created by adding the bypass path increased by 10-20%.

- \* Adding the bypass path breaks the regular bit-slice structure.

- \* For N-bit bypass the delay is linear.

## 2. Linear carry-select adder



- \* In ripple carry adder, every full adder has to wait for the incoming carry.

- \* In this method, consider both the possible values of carry (0 or 1) and evaluate the result for both in advance.

- \* Once the real value of incoming carry is known, the correct result is selected using a multiplexer with minimum delay.

## 4-bit carry select module



## 16 bit linear carry select Adder

\* A full carry select adder is by chaining a number of equal length adder stages.

\* The worst case propagation delay is given as

$$t_{\text{carry}} = t_{\text{setup}} + M t_{\text{carry}} + (N/M) t_{\max} + t_{\text{sum}}$$

$N \rightarrow$  Total number of bits

$M \rightarrow$  Number of bits per stage respectively

$t \rightarrow$  delay of carry through a single full adder

\* The carry delay in a single block is proportional to the length of that stage or equals  $M$ .

\* The proportional delay of adder is linearly proportional to  $N$ .

\* Because the block-select signal selects between 0 and 1 ripple through all stages in the worst case.

### 16 bit linear carry select adder:



Advantages

\* Less propagation delay

\* Reduce the computation delay

Drawbacks

\* High power consumption

### 3. square root carry select adder



- \* consider the linear carry-select adder - the full adder and multiplexer have identical propagation delays with the normalized value of 1.
- \* The worst case arrival times of the signals at the different network nodes with respect to the time i/p is applied are marked.
- \* For the last multiplexer, the inputs are the two carry chains of the block multiplexer signal from the previous stage.
- \* The result of the carry chains are stable long before the multiplexer signal arrives
- \* So the delay has to be equalized through the both paths
- \* To overcome this, progressively add more bits to the subsequent stages in the adder, requiring more time for the generation of the carry signals.
- \* The 1st stage can add 2 bits, the 2nd stage contains 3, the 3rd has 4.
- \* The 1st stage adds M bits, an additional bit added to each subsequent stage

$$N = M + (M+1) + (M+2) + (M+3) + \dots + (M+P-1)$$

simplified to

$$N = P^2/2 \quad (\text{or}) \quad P = \sqrt{2}N$$

The worst case propagation delay

$$t_{\text{add}} = t_{\text{setup}} + p t_{\text{carry}} + \sqrt{2N} t_{\text{mux}} + t_{\text{sum}}$$

Adder delays - comparison



For large value of N,  $t_{\text{add}}$  become almost constant