

# Arithmetic Circuits

---

Virendra Singh

Professor

Computer Architecture and Dependable Systems Lab

Department of Computer Science & Engineering, and

Department of Electrical Engineering

Indian Institute of Technology Bombay

<http://www.cse.iitb.ac.in/~viren/>

E-mail: viren@{cse, ee}.iitb.ac.in

*CS-230: Digital Logic Design & Computer Architecture*

---



Lecture 13 (03 February 2022)

**CADSL**

# Arithmetic

## Circuits:

(Adders)



# Full-Adder Adds Three Bits

---



# 32-bit Ripple-Carry Adder

$$\begin{array}{r} c_{32} \ c_{31} \dots \ c_2 \ c_1 \ 0 \\ a_{31} \dots \ a_2 \ a_1 \ a_0 \\ + b_{31} \dots \ b_2 \ b_1 \ b_0 \\ \hline s_{31} \dots \ s_2 \ s_1 \ s_0 \end{array}$$



cost  $\propto n \cdot (\text{cost of FA})$   
delay  $\propto n$ .

Cut down carry propagation path

# Ripple Carry Adder

---

- Easy to create
  - Good hierarchy
  - Tileable structures
- Small hardware
- Slow!
  - Design is limited by the delays in propagating carry through all of the bitwise additions
  - The output bit at position  $m$  is not valid until after the carry out of the  $m-1$  position is ready



# FAST ADDERS



# Faster Addition

- For big adders, the carry-chain is very long
- Separate the carry chain and sum logic
- Partial Full Adders
  - Contain only the sum part of a FA





# Carry Lookahead

A:  $DB_i'$

- Given Stage  $i$  from a Full Adder, we know that there will be a carry generated when  $A_i = B_i = "1"$ , whether or not there is a carry-in.
- Alternately, there will be a carry propagated if the “half-sum” is “1” and a carry-in,  $C_i$  occurs.
- These two signal conditions are called generate, denoted as  $G_i$ , and propagate, denoted as  $P_i$  respectively and are identified in the circuit:



# Reassembling RCA Using PFAs

- Can create ripple-carry adder with PFAs
  - G: **Generates** a carry at this position
  - P: **Propagates** a carry through this position



# Carry Lookahead (continued)

- In the ripple carry adder:
  - $G_i$ ,  $P_i$ , and  $S_i$  are local to each cell of the adder
  - $C_i$  is also local each cell
- In the carry lookahead adder, in order to reduce the length of the carry chain,  $C_i$  is changed to a more global function spanning multiple cells
- Defining the equations for the Full Adder in term of the  $P_i$  and  $G_i$ :

$$P_i = \underline{A_i} \oplus \underline{B_i}$$
$$S_i = P_i \oplus C_i$$

$$G_i = \underline{\underline{A_i} \underline{\underline{B_i}}}$$
$$C_{i+1} = \underbrace{G_i}_{\text{---}} + \underbrace{P_i}_{\text{---}} \underbrace{C_i}_{\text{---}}$$

$$A_i \oplus B_i \oplus C_i$$



# Carry Lookahead Development

- Flatten equations for carry using  $G_i$  and  $P_i$  terms for less significant bits

$$G_0 = A_0 \cdot B_0$$

- Beginning at the cell 0 with carry in  $C_0$ :

$$P_0 = A_0 \oplus B_0$$

$$C_1 = G_0 + P_0 C_0$$



$$C_2 = G_1 + P_1 C_1 \quad C_1 = G_1 + P_1(G_0 + P_0 C_0)$$

$$= G_1 + P_1 G_0 + P_1 P_0 C_0$$



# Carry Lookahead Development

$$C_3 = G_2 + P_2 \quad C_2 = G_2 + P_2(G_1 + P_1G_0 + P_1P_0 C_0)$$

$$= \textcircled{G_2} + P_2 \cancel{G_1} + P_2 P_1 \cancel{G_0} + P_2 P_1 P_0 C_0$$

4<sub>2</sub>

$$\checkmark \quad C_4 = G_3 + P_3 \quad C_3 = G_3 + P_3 G_2 + P_3 P_2 G_1 \\ + P_3 P_2 P_1 G_0 + P_3 P_2 P_1 P_0 C_0$$



# Carry Lookahead Adder

$$\begin{aligned}G_0 &= A_0 \bar{B}_0 \\G_1 &= A_1 \bar{B}_1 \\P_0 &= P_0 \oplus B_0 \\P_1 &= A_0 \oplus B_1\end{aligned}$$

$$G_1 + P_1 G_0 + P_1 P_0 C_0$$



# Carry Lookahead Adder

delay  $\propto n$

$$4T + (n+1)T = 5T + (n)T$$

BJT       $\rightarrow$  delay is independent of input.

TTL       $\rightarrow$  constant delay.



# Carry Look-Ahead Adder

---

- As usual, can trade area/power for speed
  - Multi-level logic (ripple carry) reduces area
  - Flattening carry logic increases speed
- Sometimes called CLA

Break carry chain



# Speeding Up the Adder



# Fast Adders

---

- In general, any output of a 32-bit adder can be evaluated as a logic expression in terms of all 65 inputs.  

- Number of levels of logic can be reduced to  $\log_2 N$  for N-bit adder. Ripple-carry has N levels.
- More gates are needed, about  $\log_2 N$  times that of ripple-carry design.





# Prefix Computation



# Key Architectures for Carry Calculation

---

- 1960: J. Sklansky adder
- 1973: Kogge-Stone adder ✓
- 1980: Ladner-Fisher adder
- 1982: Brent-Kung adder
- 1987: Han Carlson adder
- 1999: S. Knowles adder

Other parallel adder architectures:

- 1981: H. Ling adder
- 2001: Beaumont-Smith



# Binary Addition

---

- **Input:** two  $n$ -bit binary numbers  $a_{n-1} \dots a_1 a_0$  and  $b_{n-1} \dots b_1 b_0$ , one bit carry-in  $c_0$
- **Output:**  $n$ -bit sum  $s_{n-1} \dots s_1 s_0$  and one bit carry out  $c_n$
- Prefix Addition: Carry generation & propagation

Generate:  $g_i = a_i \cdot b_i$

Propagate:  $p_i = a_i \oplus b_i$

$$c_{i+1} = g_i + p_i \cdot c_i$$

$$s_i = c_i \oplus (a_i \oplus b_i)$$



# Binary Addition as a prefix sum problem.



A stage  $i$  will generate a carry if  
 $g_i = a_i \cdot b_i$   
and propagate a carry if  
 $p_i = \text{XOR}(a_i, b_i)$   
Hence for stage  $i$ :

$$c_i = g_i + p_i c_{i-1}$$

With :

$$(G_i, P_i) = (g_i, p_i) \circ (G_{i-1}, P_{i-1})$$

$(G_0, P_0) = (g_0, p_0)$

✓  $(g_x, p_x) \circ (g_y, p_y) = (g_x \# p_x \cdot g_y, p_x \cdot p_y)$

We have:

$$(G_1, P_1) = (g_1, p_1)$$

$$(G_2, P_2) = (g_2, p_2) \circ (G_1, P_1) = (g_2 \# p_2 \cdot g_1, p_2 \cdot p_1)$$

$$\begin{aligned} (G_3, P_3) &= (g_3, p_3) \circ (G_2, P_2) = (g_3 + p_3 \cdot (g_2 + p_2 \cdot g_1), p_3 \cdot p_2 \cdot p_1) \\ &= (g_3 + p_3 \cdot g_2 + p_3 \cdot p_2 \cdot g_1, p_3 \cdot p_2 \cdot p_1) \end{aligned}$$

etc ...

Where :



... The familiar  
carry bit generating  
equations for stage  $i$   
in a CLA adder.



# Parallel Prefix Adders

- The parallel prefix adder employs the 3-stage structure of the CLA adder. The improvement is in the carry generation stage which is the most intensive one:



# Prefix Addition – Formulation

Pre-processing:

$$g_i = a_i b_i \quad p_i = a_i \oplus b_i$$

Prefix Computation:

$$\begin{aligned} G_{[i:k]} &= G_{[i:j]} + P_{[i:j]} G_{[j+1:k]} \\ P_{[i:k]} &= P_{[i:j]} P_{[j+1:k]} \end{aligned}$$

Post-processing:

$$\begin{aligned} c_{i+1} &= G_{[i:0]} + P_{[i:0]} \cdot c_0 \\ s_i &= p_i \oplus c_i \end{aligned}$$



generated  
for preo  
group



# Computation of Carries – Prefix Graphs

The components usually seen in a prefix graph are the following:

*processing component:*



*buffer component:*



$$(g_{out}, p_{out}) = (g_{in_1} + p_{in_1} \cdot g_{in_2}, p_{in_1} \cdot p_{in_2})$$

$$(g_{out}, p_{out}) = (g_{in}, p_{in})$$

# Prefix graphs for representation of Prefix addition ✓

- Serial adder carry generation represented by prefix graphs



# Prefix Adder – Prefix Structure Graph



# Classical Prefix Adders



Brent-Kung:  
Logical levels:  $2\log_2 n - 1$   
Max fanouts: 2



Sklansky:  
Logical levels:  $\log_2 n$   
Max fanouts:  $n/2$



Kogge-Stone:  
Logical levels:  $\log_2 n$   
Max fanouts: 2



# Kogge Stone Adder

$$A = 1001 \quad B = 1100 \quad \text{Sum} = 10101$$



Legend:



$$P = A_i \text{ xor } B_i \quad P = P_i \text{ and } P_{i-1}$$

$$G = A_i \text{ and } B_i \quad G = (P_i \text{ and } G_{i-1}) \text{ or } G_i$$

$$C_i = G_i$$

$$S_i = P_i \text{ xor } C_{i-1}$$



# 1973: Kogge-Stone adder



- The Kogge-Stone adder has:
  - Low depth
  - High node count (implies more area).
  - Minimal fan-out of 1 at each node (implies faster performance).



# 1982: Brent-Kung (BK) Adder



Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982



# Brent-Kung (BK) Adder



multiplier

$A_i$





19

$$a_1 \cdot \overline{b_0} + b_1 \cdot \overline{a_0}$$

$$(a_0 \cdot \overline{b_1})_f$$



$a_3 \ a_2 \ a_1 \ a_0$

$b_3 \ b_2 \ b_1 \ b_0$



Carry  
Save  
Doch.

$$\begin{array}{r} 0 \\ 278 \\ 151 \\ \hline 429 \end{array} \rightarrow \begin{array}{r} 278 \\ 151 \\ \hline 329 \end{array}$$
$$\begin{array}{r} 329 \\ 100 \leftarrow \\ \hline 429 \rightleftarrows \end{array}$$
$$\begin{array}{r} 278 \\ 151 \\ \hline 0.10 \leftarrow \end{array}$$

Array Multiplic



# Thank You

