

# Low Power Approximate Multipliers using Compressors

- Objective: Design and implement an energy efficient multiplier using approximate compressors and encoded partial products.
- Motivation: Approximate computing reduces power and area with minimal accuracy loss.
- Block diagram of the 8x8 multiplier.



Designed by:- Deepak Kumar

# Exact 4x4 Multiplier

- In the multiplication of 2 4-bit terms the product is coming in stages
- In stage 1, half adder can simply use to produce sum as ( $\gamma_1$ ) and carry ( $c_1$ ) for next stage.
- In stage 2, there are already 3 partial product terms present, so we require 4:2 compressor that takes  $c_1$  and 3 pp terms as input and gives sum bit ( $\gamma_2$ ) and carry ( $c_2$ ) for next stage.

ORIGINAL PARTIAL PRODUCT OF THE MULTIPLICATION

| Stage<br>7 | Stage<br>6 | Stage<br>5 | Stage<br>4 | Stage<br>3 | Stage<br>2 | Stage<br>1 | Stage<br>0 |
|------------|------------|------------|------------|------------|------------|------------|------------|
| $pp_{3,3}$ | $pp_{3,2}$ | $pp_{3,1}$ | $pp_{3,0}$ | $pp_{2,0}$ | $pp_{1,0}$ | $pp_{0,0}$ |            |
|            | $pp_{2,3}$ | $pp_{2,2}$ | $pp_{2,1}$ | $pp_{1,1}$ | $pp_{0,1}$ |            |            |
|            |            | $pp_{1,3}$ | $pp_{1,2}$ | $pp_{0,2}$ |            |            |            |
|            |            |            | $pp_{0,3}$ |            |            |            |            |
| $\gamma_7$ | $\gamma_6$ | $\gamma_5$ | $\gamma_4$ | $\gamma_3$ | $\gamma_2$ | $\gamma_1$ | $\gamma_0$ |

# Theoretical Background

- Overview of Approximate 4:2 Compressor :
  - a . Exact 4:2 Compressor uses XOR, AND, OR that gives high power & delay.

$$Sum = x_1 \oplus x_2 \oplus x_3 \oplus x_4 \oplus C_{in},$$

$$C_{out} = (x_1 \oplus x_2) \cdot x_3 + (\overline{x_1 \oplus x_2}) \cdot x_1,$$

$$Carry = (x_1 \oplus x_2 \oplus x_3 \oplus x_4) \cdot C_{in} + (\overline{x_1 \oplus x_2 \oplus x_3 \oplus x_4}) \cdot x_4.$$



- b. Modified compressor : Ignoring XOR gates(high delay) and Cout(doesn't impact much on accuracy)is our goal.

By ignoring this there are 5/7 error values in carry/sum; to incorporate this error we give the inputs to compressor using propagate and generate signals.

# Propagators

It performs logical OR of two partial products.

*Propagator*

$$P_{i,j} = P_{i,j} + P_{j,i}$$
$$P_{2,0} = P_{2,0} + P_{0,2}$$
$$\downarrow$$
$$P_{2,0} = a_2 b_0 + a_0 b_2$$

( $P_{2,0}$ )

$$P_{i,j} = a_i \cdot b_j$$
$$P_{2,0} = a_2 \cdot b_0$$



# Generators

It performs logical AND of two partial products.



# Stage 2 Compressor

- Inputs:  $G_{20}$ ,  $P_{20}$ ,  $pp_{11}$  and carry  $c_1$ .
- Simplified Logic (from paper eq. 3):
- Sum= $x_1+x_3$ , Carry= $x_2+x_4$
- Reduced from 5/7 faulty outputs to 2/4 faulty outputs.



# Stage 3 Compressor

- Inputs:  $c_2, G_{30}+G_{21}, P_{21}, P_{30}$
- Logic Simplification (from paper Eq. 5):
- Sum =  $x_1+x_3+x_4$ , Carry =  $(x_1.x_2) + (x_3.x_4)$
- Acts like a 5:2 compressor with reduced complexity.



Designed by:- Deepak Kumar

# Stage 4 Compressor

- Inputs:  $G_{31}$ ,  $P_{31}$ ,  $pp_{22}$ , and  $c_3$ .
- Simplified logic (from paper Eq. 6):
- Sum =  $x_1+x_2+x_3$ , Carry =  $x_2 + (x_3 \cdot x_4)$
- Number of error cases further reduced with negligible probability.



# Final Implementation of 4x4

- Combined the Stage 2–4 compressors and Gij/Pij generators.
- Architecture follows the paper's **M2 design** (rejecting  $c_4$  carry).
- As speed is high in case of M2 design compared to M1(where  $c_4$  is fed into stage 5 and hence reducing the speed).



4:2 Compressor, Table 4

Full Adder

Half Adder

4:2 Compressor, Table 6

4:2 Compressor, Table 5



# Building 8x8 Multiplier Using 4x4 Blocks

The 8-bit operands A and B are divided into two 4-bit halves:

- $A = \{AH, AL\} \rightarrow$  upper 4 bits (AH) and lower 4 bits (AL)
- $B = \{BH, BL\} \rightarrow$  upper 4 bits (BH) and lower 4 bits (BL)
- Each combination of these halves forms **four 4x4 multipliers**:
- $P_1 = \alpha H \times \beta H \rightarrow$  Most significant partial product
- $P_2 = \alpha L \times \beta H$
- $P_3 = \alpha H \times \beta L$
- $P_4 = \alpha L \times \beta L \rightarrow$  Least significant partial product
- These four 4x4 products are then shifted and added to form the final 8x8 result:

$$\begin{array}{r} AH \quad AL \\ BH \quad BL \\ \hline b_L \cdot a_H + b_H \cdot a_L \\ b_H \cdot a_H \\ \hline b_L \cdot a_L \end{array} \times$$



# The Final Block



Designed by:- Deepak Kumar

# Exact 8x8 Multiplier



Designed by:- Deepak Kumar

# Comparison Table

All the multipliers were tested with 8bit ones (255) as A and B.

| Exact            | Carry Disregard<br>Approximate Multiplier | Low Power<br>Approximate<br>Multipliers using<br>Compressors | Hybrid Parallel-Product<br>High Performance<br>Approximate<br>Recursive Multipliers |
|------------------|-------------------------------------------|--------------------------------------------------------------|-------------------------------------------------------------------------------------|
| 65025            | 40661                                     | 45373                                                        | 62241                                                                               |
| 0111111000000001 | 01001111011010101                         | 01011000100111101                                            | 01111001100100001                                                                   |
| Error Distance   | 6                                         | 9                                                            | 4                                                                                   |

# Performance Comparison Table

We tested these multipliers for many different inputs and took average power and delay of them..

| 8x8 Multiplier                                                             | Average Delay       | Average Power      |
|----------------------------------------------------------------------------|---------------------|--------------------|
| Exact Multiplier                                                           | 5.9986775 ns        | 477.63918 μW       |
| Carry Disregard Approximate Multiplier                                     | <b>5.7585925 ns</b> | 232.48725 μW       |
| Low Power Approximate Multipliers using Compressors                        | 5.802965 ns         | <b>228.0562 μW</b> |
| Hybrid Parallel-Product High Performance Approximate Recursive Multipliers | 5.916225 ns         | 289.4195 μW        |