

---

# **CSE477**

# **VLSI Digital Circuits**

# **Fall 2002**

## **Lecture 20: Adder Design**

Mary Jane Irwin ( [www.cse.psu.edu/~mji](http://www.cse.psu.edu/~mji) )  
[www.cse.psu.edu/~cg477](http://www.cse.psu.edu/~cg477)

[Adapted from Rabaey's *Digital Integrated Circuits*, ©2002, J. Rabaey et al.]

# Review: Basic Building Blocks

- Datapath

- Execution units
  - Adder, multiplier, divider, shifter, etc.
- Register file and pipeline registers
- Multiplexers, decoders

- Control

- Finite state machines (PLA, ROM, random logic)

- Interconnect

- Switches, arbiters, buses

- Memory

- Caches (SRAMs), TLBs, DRAMs, buffers

# The 1-bit Binary Adder



$$G = A \& B$$

$$P = A \oplus B$$

$$K = !A \& !B$$

| A | B | C <sub>in</sub> | C <sub>out</sub> | S | carry status |
|---|---|-----------------|------------------|---|--------------|
| 0 | 0 | 0               | 0                | 0 | kill         |
| 0 | 0 | 1               | 0                | 1 | kill         |
| 0 | 1 | 0               | 0                | 1 | propagate    |
| 0 | 1 | 1               | 1                | 0 | propagate    |
| 1 | 0 | 0               | 0                | 1 | propagate    |
| 1 | 0 | 1               | 1                | 0 | propagate    |
| 1 | 1 | 0               | 1                | 0 | generate     |
| 1 | 1 | 1               | 1                | 1 | generate     |

$$\begin{aligned}S &= A \oplus B \oplus C_{\text{in}} &= P \oplus C_{\text{in}} \\C_{\text{out}} &= A \& B \mid A \& C_{\text{in}} \mid B \& C_{\text{in}} \quad (\text{majority function}) \\&= G \mid P \& C_{\text{in}}\end{aligned}$$

- ❑ How can we use it to build a 64-bit adder?
- ❑ How can we modify it easily to build an adder/subtractor?
- ❑ How can we make it better (faster, lower power, smaller)?

# FA Gate Level Implementations

- The way you learned to design in CSE271 and CSE471



# Review: XOR FA



16 transistors

# Review: CPL FA



20+8 transistors, dual rail – beware of threshold drops

# Review: Mirror Adder

24+4 transistors



**Sizing:** Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. Since  $\neg C_{out}$  drives 2 internal and 2 inverter transistor gates (to form  $C_{in}$  for the nms bit adder) should oversize the carry circuit. PMOS/NMOS ratio of 2.

# Mirror Adder Features

---

- ❑ The NMOS and PMOS chains are **completely symmetrical** with a maximum of two series transistors in the carry circuitry, guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized.
- ❑ When laying out the cell, the most critical issue is the minimization of the capacitances at node  $\text{!C}_{\text{out}}$  (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances.
- ❑ The transistors connected to  $C_{\text{in}}$  are placed closest to the output.
- ❑ Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.

# A 64-bit Adder/Subtractor

- ❑ Ripple Carry Adder (RCA)  
built out of 64 FAs
- ❑ Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in
- ❑ RCA
  - advantage: simple logic, so small (low cost)
  - disadvantage: slow (**O(N)** for N bits) and lots of glitching (so lots of energy consumption)



# Ripple Carry Adder (RCA)



$$T_{\text{adder}} \approx T_{\text{FA}}(A, B \rightarrow C_{\text{out}}) + (N-2)T_{\text{FA}}(C_{\text{in}} \rightarrow C_{\text{out}}) + T_{\text{FA}}(C_{\text{in}} \rightarrow S)$$

$T = O(N)$  worst case delay

Real Goal: Make the fastest possible carry path

# Inversion Property

- ❑ Inverting all inputs to a FA results in inverted values for all outputs



$$!S(A, B, C_{in}) = S(!A, !B, !C_{in})$$

$$!C_{out}(A, B, C_{in}) = C_{out}(!A, !B, !C_{in})$$

# Exploiting the Inversion Property



- ❑ Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder).

Now need two “flavors” of FAs

# Fast Carry Chain Design

- ❑ The key to fast addition is a low latency carry network
- ❑ What matters is whether in a given position a carry is
  - generated  $G_i = A_i \& B_i = A_i B_i$
  - propagated  $P_i = A_i \oplus B_i$  (sometimes use  $A_i \mid B_i$ )
  - annihilated (killed)  $K_i = !A_i \& !B_i$
- ❑ Giving a carry recurrence of

$$C_{i+1} = G_i \mid P_i C_i$$

$$C_1 = G_0 \mid P_0 C_0$$

$$C_2 = G_1 \mid P_1 G_0 \mid P_1 P_0 C_0$$

$$C_3 = G_2 \mid P_2 G_1 \mid P_2 P_1 G_0 \mid P_2 P_1 P_0 C_0$$

$$C_4 = G_3 \mid P_3 G_2 \mid P_3 P_2 G_1 \mid P_3 P_2 P_1 G_0 \mid P_3 P_2 P_1 P_0 C_0$$

# Manchester Carry Chain

- Switches controlled by  $G_i$  and  $P_i$



- Total delay of
  - time to form the switch control signals  $G_i$  and  $P_i$
  - setup time for the switches
  - signal propagation delay through  $N$  switches in the worst case

# 4-bit Sliced MCC Adder



# Domino Manchester Carry Chain Circuit



# Binary Adder Landscape



# Carry-Skip (Carry-Bypass) Adder



If ( $P_0 \& P_1 \& P_2 \& P_3 = 1$ ) then  $C_{o,3} = C_{i,0}$  otherwise the block **itself** kills or generates the carry internally

# Carry-Skip Chain Implementation



# 4-bit Block Carry-Skip Adder



**Worst-case delay** → carry from bit 0 to bit 15 = carry generated in bit 0, **ripples** through bits 1, 2, and 3, **skips** the middle two groups (B is the group size in bits), **ripples** in the last group from bit 12 to bit 15

$$T_{\text{add}} = t_{\text{setup}} + B t_{\text{carry}} + ((N/B) - 1) t_{\text{skip}} + B t_{\text{carry}} + t_{\text{sum}}$$

# Optimal Block Size and Time

- ❑ Assuming one stage of ripple ( $t_{carry}$ ) has the same delay as one skip logic stage ( $t_{skip}$ ) and both are 1

- ❑ So the optimal block size, B, is

$$dT_{CSkA}/dB = 0 \Rightarrow \sqrt{N/2} = B^{\text{opt}}$$

- ❑ And the optimal time is

$$\text{Optimal } T_{CSkA} = 2(\sqrt{2N}) + 1$$

# Carry-Skip Adder Extensions

## ❑ Variable block sizes

- ❑ A carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the **inner carries** without increasing the overall delay



## ❑ Multiple levels of skip logic



# Carry-Skip Adder Comparisons



# Parallel Prefix Adders (PPAs)

- Define carry operator  $\epsilon$  on  $(G, P)$  signal pairs

$(G'', P'')$      $(G', P')$



where

$$G = G'' \vee P''G'$$
$$P = P''P'$$



- $\epsilon$  is associative, i.e.,

$$[(g''', p''')] \epsilon (g'', p'')] \epsilon (g', p') = (g''', p''') \epsilon [(g'', p'') \epsilon (g', p')]$$



# PPA General Structure

- Given  $P$  and  $G$  terms for each bit position, computing all the carries is equal to finding *all* the prefixes in parallel

$$(G_0, P_0) \in (G_1, P_1) \in (G_2, P_2) \in \dots \in (G_{N-2}, P_{N-2}) \in (G_{N-1}, P_{N-1})$$

- Since  $\in$  is associative, we can group them in any order
  - but note that it is **not** commutative



- Measures to consider
  - number of  $\in$  cells
  - tree cell depth (time)
  - tree cell area
  - cell fan-in and fan-out
  - max wiring length
  - wiring congestion
  - delay path variation (glitching)

# Brent-Kung PPA

Parallel Prefix Computation



# Kogge-Stone PPF Adder

Parallel Prefix Computation



$$T_{add} = t_{\text{setup}} + \log_2 N t_{\epsilon} + t_{\text{sum}}$$

# More Adder Comparisons



# Next Lecture and Reminders

---

## ❑ Next lecture

- Multiplier Design
  - Reading assignment – Rabaey, et al, 11.4

## ❑ Reminders

- Project final reports due December 5<sup>th</sup>
- HW5 (last one!) due November 19<sup>th</sup>
- Final grading negotiations/correction (except for the final exam) must be concluded by December 10<sup>th</sup>
- Final exam scheduled
  - Monday, December 16<sup>th</sup> from 10:10 to noon in 118 and 121 Thomas