



# *Digital Integrated Circuits*

## *A Design Perspective*

Jan M. Rabaey  
Anantha Chandrakasan  
Borivoje Nikolic

## Arithmetic Circuits

*January, 2003*

# *A Generic Digital Processor*



# *Building Blocks for Digital Architectures*

## **Arithmetic unit**

- Bit-sliced datapath (adder, multiplier, shifter, comparator, etc.)

## **Memory**

- RAM, ROM, Buffers, Shift registers

## **Control**

- Finite state machine (PLA, random logic.)
- Counters

## **Interconnect**

- Switches
- Arbiters
- Bus

# An Intel Microprocessor



Itanium has 6 integer execution units like this

# *Bit-Sliced Design*



**Tile identical processing elements**

# Bit-Sliced Datapath



# Itanium Integer Datapath





# Adders

# Full-Adder



| <i>A</i> | <i>B</i> | <i>C<sub>i</sub></i> | <i>S</i> | <i>C<sub>o</sub></i> | <i>Carry status</i> |
|----------|----------|----------------------|----------|----------------------|---------------------|
| 0        | 0        | 0                    | 0        | 0                    | delete              |
| 0        | 0        | 1                    | 1        | 0                    | delete              |
| 0        | 1        | 0                    | 1        | 0                    | propagate           |
| 0        | 1        | 1                    | 0        | 1                    | propagate           |
| 1        | 0        | 0                    | 1        | 0                    | propagate           |
| 1        | 0        | 1                    | 0        | 1                    | propagate           |
| 1        | 1        | 0                    | 0        | 1                    | generate            |
| 1        | 1        | 1                    | 1        | 1                    | generate            |

# The Binary Adder



$$S = A \oplus B \oplus C_i$$

$$= A\bar{B}\bar{C}_i + \bar{A}B\bar{C}_i + \bar{A}\bar{B}C_i + ABC$$

$$C_o = AB + BC_i + AC_i$$

# *Express Sum and Carry as a function of P, G, D*

Define 3 new variable which ONLY depend on A, B

$$\text{Generate } (G) = AB$$

$$\text{Propagate } (P) = A \oplus B$$

$$\text{Delete} = \overline{\overline{A}} \ \overline{\overline{B}}$$

$$C_o(G, P) = G + PC_i$$

$$S(G, P) = P \oplus C_i$$

Can also derive expressions for S and  $C_o$  based on D and P

Note that we will be sometimes using an alternate definition for

$$\text{Propagate } (P) = A + B$$

# The Ripple-Carry Adder



**Worst case delay linear with the number of bits**

$$t_d = O(N)$$

$$t_{\text{adder}} = (N-1)t_{\text{carry}} + t_{\text{sum}}$$

Goal: Make the fastest possible carry path circuit

# Complimentary Static CMOS Full Adder



28 Transistors

# Inversion Property



$$\bar{S}(A, B, C_i) = S(\bar{A}, \bar{B}, \bar{C}_i)$$

$$\bar{C}_o(A, B, C_i) = C_o(\bar{A}, \bar{B}, \bar{C}_i)$$

# *Minimize Critical Path by Reducing Inverting Stages*



**Exploit Inversion Property**

# A Better Structure: The Mirror Adder



24 transistors

# Mirror Adder

Stick Diagram



# *The Mirror Adder*

- The NMOS and PMOS chains are **completely symmetrical**. A maximum of two series transistors can be observed in the carry-generation circuitry.
- When laying out the cell, the most critical issue is the minimization of the capacitance at node  $C_o$ . The reduction of the diffusion capacitances is particularly important.
- The capacitance at node  $C_o$  is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell .
- The transistors connected to  $C_i$  are placed closest to the output.
- Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.

# Transmission Gate Full Adder



Setup



# Manchester Carry Chain



# Manchester Carry Chain



# Manchester Carry Chain

## Stick Diagram

Propagate/Generate Row



# Carry-Bypass Adder



Also called  
Carry-Skip



Idea: If ( $P_0$  and  $P_1$  and  $P_2$  and  $P_3 = 1$ )  
then  $C_{o,3} = C_0$ , else “kill” or “generate”.

# Carry-Bypass Adder (cont.)



$$t_{\text{adder}} = t_{\text{setup}} + M t_{\text{carry}} + (N/M - 1) t_{\text{bypass}} + (M - 1) t_{\text{carry}} + t_{\text{sum}}$$

# *Carry Ripple versus Carry Bypass*



# Carry-Select Adder



# Carry Select Adder: Critical Path



# Linear Carry Select



$$t_{add} = t_{setup} + \left(\frac{N}{M}\right)t_{carry} + Mt_{mux} + t_{sum}$$

# Square Root Carry Select



$$t_{add} = t_{setup} + P \cdot t_{carry} + (\sqrt{2N}) t_{mux} + t_{sum}$$

# Adder Delays - Comparison



# LookAhead - Basic Idea



$$C_{o,k} = f(A_k, B_k, C_{o,k-1}) = G_k + P_k C_{o,k-1}$$

# Look-Ahead: Topology

Expanding Lookahead equations:

$$C_{o,k} = G_k + P_k(G_{k-1} + P_{k-1}C_{o,k-2})$$

All the way:

$$C_{o,k} = G_k + P_k(G_{k-1} + P_{k-1}(\dots + P_1(G_0 + P_0 C_{i,0})))$$



# Logarithmic Look-Ahead Adder



# **Carry Lookahead Trees**

$$C_{o,0} = G_0 + P_0 C_{i,0}$$

$$C_{o,1} = G_1 + P_1 G_0 + P_1 P_0 C_{i,0}$$

$$C_{o,2} = G_2 + P_2 G_1 + P_2 P_1 G_0 + P_2 P_1 P_0 C_{i,0}$$

$$= (G_2 + P_2 G_1) + (P_2 P_1)(G_0 + P_0 C_{i,0}) = G_{2:1} + P_{2:1} C_{o,0}$$

Can continue building the tree hierarchically.

# Tree Adders



16-bit radix-2 Kogge-Stone tree

# Tree Adders



16-bit radix-4 Kogge-Stone Tree

# Sparse Trees



16-bit radix-2 sparse tree with sparseness of 2

# Tree Adders



Brent-Kung Tree

# Example: Domino Adder



Propagate



Generate

# Example: Domino Adder



Propagate



Generate

# Example: Domino Sum





# *Multipliers*

42

# The Binary Multiplication

$$\begin{aligned} Z = \tilde{X} \times Y &= \sum_{k=0}^{M-1} Z_k 2^k \\ &= \left( \sum_{i=0}^{M-1} X_i 2^i \right) \left( \sum_{j=0}^{N-1} Y_j 2^j \right) \\ &= \sum_{i=0}^{M-1} \left( \sum_{j=0}^{N-1} X_i Y_j 2^{i+j} \right) \end{aligned}$$

with

$$\begin{aligned} X &= \sum_{i=0}^{M-1} X_i 2^i \\ Y &= \sum_{j=0}^{N-1} Y_j 2^j \end{aligned}$$

# The Binary Multiplication

$$\begin{array}{r} & 1 & 0 & 1 & 0 & 1 & 0 \\ \times & & & 1 & 0 & 1 & 1 \\ \hline & 1 & 0 & 1 & 0 & 1 & 0 \\ & 1 & 0 & 1 & 0 & 1 & 0 \\ & 0 & 0 & 0 & 0 & 0 & 0 \\ + & 1 & 0 & 1 & 0 & 1 & 0 \\ \hline & 1 & 1 & 1 & 0 & 0 & 1 & 1 & 0 \end{array}$$

Multiplicand

Multiplier

Partial products

Result

A binary multiplication diagram showing the process of multiplying the multiplicand 101010 by the multiplier 1011. The result is 11100110. The diagram illustrates the formation of partial products and their summation. A brace groups the partial products and the final result.

# The Array Multiplier



# The $M \times N$ Array Multiplier

## — Critical Path



$$t_{mult} \approx [(M-1) + (N-2)]t_{carry} + (N-1)t_{sum} + (N-1)t_{and}$$

# Carry-Save Multiplier



$$t_{mult} = (N-1)t_{carry} + (N-1)t_{and} + t_{merge}]$$

# Multiplier Floorplan



**X and Y signals are broadcasted through the complete array.  
( ..... )**

# Wallace-Tree Multiplier

Partial products

|   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|
| 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| ○ | ○ | ○ | ○ | ○ | ○ | ○ |
| ○ | ○ | ○ | ○ | ○ | ○ | ○ |
| ○ | ○ | ○ | ○ | ○ | ○ | ○ |
| ○ | ○ | ○ | ○ | ○ | ○ | ○ |

(a)

First stage



(b)

Second stage



(c)

Final adder



(d)

# Wallace-Tree Multiplier



# Wallace-Tree Multiplier



# *Multipliers —Summary*

- Optimization Goals Different Vs Binary Adder
- Once Again: Identify Critical Path
- Other possible techniques
  - Logarithmic versus Linear (Wallace Tree Mult)
  - Data encoding (Booth)
  - Pipelining

FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION



# *Shifters*

# The Binary Shifter



# The Barrel Shifter



Area Dominated by Wiring

# *4x4 barrel shifter*



$$\text{Width}_{\text{barrel}} \sim 2 p_m M$$

# Logarithmic Shifter



# 0-7 bit Logarithmic Shifter



$$width_{\log} \approx p_m(2K + (1 + 2 + \dots + 2^{K-1})) = p_m(2^K + 2K - 1)$$