

# IVLSI 6 Bit Wallace Tree Multiplier

## Contents:

- Background Theory (Wallace tree Multiplier delay compared to Array Multiplier, reduction of stages)
- Basic components: Schematic/layout(should be able to recognise comp/drv/lvs/graphs of each component)
- Complete circuit -schematic/layout/drc/lvs/waveform, **specs:area,delay**, total number of components
- Questions to be answered: how did you optimize area?
- Basic procedure: symbol/tb/layout link

## Background Theory:

**Array multiplier**

| X  | X3    | X2    | X1    | X0    |
|----|-------|-------|-------|-------|
| Y  | Y3    | Y2    | Y1    | Y0    |
|    | X3•Y0 | X2•Y0 | X1•Y0 | X0•Y0 |
|    | X3•Y1 | X2•Y1 | X1•Y1 | X0•Y1 |
|    | X3•Y2 | X2•Y2 | X1•Y2 | X0•Y2 |
| +  | X3•Y3 | X2•Y3 | X1•Y3 | X0•Y3 |
| P7 | P6    | P5    | P4    | P3    |
|    | P2    | P1    | P0    |       |

- Carry must ripple from right to left and then down
  - O(N) adder delays, where N is number of bits in multiplicand

CpE 5210– Intro.VLSI Design  
Fall 2015

## Binary Multiplication

The "Binary" Multiplication Table

|   |   |   |
|---|---|---|
| X | 0 | 1 |
| 0 | 0 | 0 |
| 1 | 0 | 1 |

Binary multiplication is implemented using the same basic longhand algorithm that you learned in grade school.

$$\begin{array}{r}
 & A_3 & A_2 & A_1 & A_0 \\
 \times & B_3 & B_2 & B_1 & B_0 \\
 \hline
 A_j B_i \text{ is a "partial product"} & \longrightarrow & A_3 B_0 & A_2 B_0 & A_1 B_0 & A_0 B_0 \\
 & & A_3 B_1 & A_2 B_1 & A_1 B_1 & A_0 B_1 \\
 & & A_3 B_2 & A_2 B_2 & A_1 B_2 & A_0 B_2 \\
 & + & A_3 B_3 & A_2 B_3 & A_1 B_3 & A_0 B_3 \\
 \hline
 \end{array}$$

Multiplying N-digit number by M-digit number gives (N+M)-digit result

Easy part: forming partial products (just an AND gate since  $B_i$  is either 0 or 1)  
 Hard part: adding M, N-bit partial products

# Simple Combinational Multiplier



## The Array Multiplier



# The MxN Array Multiplier

## — Critical Path



$$t_{mult} = [(M - 1) + (N - 2)] t_{carry} + (N - 1) t_{sum} + t_{and}$$

Parallel addition of partial products does not take bits from previous column unlike array multiplier.

## Wallace-Tree Multiplier



In a conventional Wallace multiplier, the number of rows in subsequent stages can be calculated as:

$$r_{i+1} = 2[r_i/3] + r_i \bmod 3 \quad (1)$$

Where,  $r_i \bmod 3$  denotes the smallest non-negative remainder of  $r_i/3$ . The number of rows in the subsequent

stages of a conventional 8-bit by 8-bit Wallace multiplier are calculated using (1), that are:  $r_0=N=9$ ,  $r_1=6$ ,  $r_2=4$ ,  $r_3=3$ ,  $r_4=2$ . Therefore, four stages are required to perform the reduction procedure, each with delay of one full adder.

Wallace Tree Multiplier : [example  $4 \times 4$ ]

Partial Products :  $\xrightarrow{\text{"0" delay}}$  First Stage



[Don't add previous  
o/p bit, one bit stays in  
column and the other bit passed]

one bit in same  
column, one bit  
passed to next column

FINAL STAGE :



Second Stage :



FA: 3:2 Compressor

3 i/p 2 o/p

$$\begin{aligned}
 r_{i+1} &= 2 \lceil r_{i/3} \rceil + r_i \bmod 3 \\
 r_{i+2} &= 2 \left[ \lceil r_{(i+1)/3} \rceil + r_{i+1} \bmod 3 \right] \\
 &= 2 \left[ \left[ 2 \cdot \lceil r_{i/3} \rceil + r_i \bmod 3 \right] \right]_3 + r_{i+1} \bmod 3 \\
 r_{i+n} &\propto \left(\frac{2}{3}\right)^n \cdot r_i \\
 \left(\frac{3}{2}\right)^n &\propto \frac{r_i}{r_{i+n}} \propto N \left[ \text{number of bits} \right] \\
 \left(\frac{3}{2}\right)^n &= k \cdot N
 \end{aligned}$$

## Wallace Tree Mult. Performance

$$\left(\frac{3}{2}\right)^n = \frac{N}{2}$$

Full adder is 3 to 2 compressor, N is number of bits  
 n is number of stages other than final stage

$$n \log_2 \left(\frac{3}{2}\right) = \log_2 \left(\frac{N}{2}\right)$$

$$n = \frac{\log_2(N/2)}{\log_2(3/2)} = 1.71 (\log_2 N - 1)$$

$$t_{mult} = 1.71 (\log_2 N - 1) t_{sum} + t_{adder}$$

## A 6x6 Wallace Tree Multiplier

$X = 010011$   
 $Y = 101010$   
 $OUT = \dots \dots$



From <http://www10.edacafe.com/book/ASIC/CH02/CH02.6.php>

CpE 5210– Intro.VLSI Design  
 Fall 2015

D. Beetner  
 project

|          |          |          |          |          |          |
|----------|----------|----------|----------|----------|----------|
| $x_5$    | $x_4$    | $x_3$    | $x_2$    | $x_1$    | $x_0$    |
| $y_5$    | $y_4$    | $y_3$    | $y_2$    | $y_1$    | $y_0$    |
| $s_{50}$ | $s_{40}$ | $s_{30}$ | $s_{20}$ | $s_{10}$ | $s_{00}$ |
| $s_{51}$ | $s_{41}$ | $s_{31}$ | $s_{21}$ | $s_{11}$ | $s_{01}$ |
| $s_{52}$ | $s_{42}$ | $s_{32}$ | $s_{22}$ | $s_{12}$ | $s_{02}$ |
| $s_{53}$ | $s_{43}$ | $s_{33}$ | $s_{23}$ | $s_{13}$ | $s_{03}$ |
| $s_{54}$ | $s_{44}$ | $s_{34}$ | $s_{24}$ | $s_{14}$ | $s_{04}$ |
| $s_{55}$ | $s_{45}$ | $s_{35}$ | $s_{25}$ | $s_{15}$ | $s_{05}$ |

How to group:

less than 3 bits merge

a product bit

0<sup>th</sup> stage

10 9 8 7 6 5 4 3 2 1 0



1<sup>st</sup> Stage



\* one bit stays  
in same column  
next bit goes to next  
column

2<sup>nd</sup> Stage



Direct Implementation of schematic shown in slide

3<sup>rd</sup> Stage



4<sup>th</sup> stage



5<sup>th</sup> stage



6<sup>th</sup> stage [ 4 "ripple" adders ]





#### Procedure for reduction of stages:

- Whenever there are three bits in a column use FA as 3:2 compressor
- Use HA as 2:2 compressor only when it will produce a product bit
- The picture on next page shows reduction using this method, it is different from textbook method only from stage 2 to final ripple carry stage

2<sup>nd</sup> Stage

A hand-drawn number line on a light gray background. The numbers 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, and 0 are written in black ink above the line. Below the line, there are three vertical oval shapes. The first oval contains four dots at different heights. The second oval contains five dots. The third oval contains four dots.

### 3<sup>rd</sup> Stage

A hand-drawn number line on lined paper. The numbers 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, and 0 are written above the line. Below the line, there are three large, vertically oriented ovals. The first oval is positioned under the digits 7, 6, and 5. The second oval is positioned under the digits 4, 3, and 2. The third oval is positioned under the digits 1 and 0.

## 4<sup>th</sup> Stage

5<sup>th</sup> Street

A hand-drawn number line on a light-colored surface. The numbers 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, and 0 are written in black ink above horizontal tick marks. A large circle is drawn around the number 6. An arrow points from the bottom left towards the circle.

## Ripple carry adder Stage

## Complete circuit

- Vdd=1.8V
- Area=  $60.49\text{um} \times 259.51\text{um} = 15723.71 \text{ um}^2 = 0.015725 \text{ mm}^2$
- How did you measure delay ? Testbench used?
- Delay for the sequence (101101 x 100111= 0110 1101 1011) = 1.39 ns

Measuring Area:



$$\begin{array}{ccccccccc}
 & B_5 & B_4 & B_3 & B_2 & B_1 & B_0 & \times & A_5 \ A_4 \ A_3 \ A_2 \ A_1 \ A_0 \\
 \text{INPUTS:} & 1 & 0 & 1 & 1 & 0 & 1 & \times & 1 \ 0 \ 0 \ 1 \ 1 \ 1 \\
 & & & & & & & = & 0 \ 1 \ 1 \ 0 \ 1 \ 1 \ 0 \ 1 \ 1 \\
 & & & & & & & P_0 & P_1 \ P_2 \ P_3 \ P_4 \ P_5 \ P_6 \ P_7 \ P_8 \ P_9 \ P_{10} \ P_{11} \ P_{12} \ P_{13} \ P_{14} \ P_{15}
 \end{array}$$

Binary '1' = 1.8 V  
Binary '0' = 0 V



$$\begin{aligned}
 \text{Propagation Delay} &= 1.397 \text{ ns [output } P_7] - 7.005 \text{ ps} \\
 &\approx 1.39 \text{ ns}
 \end{aligned}$$

Simulation time: 200ns



## Wallace Tree Multiplier:



Wallce Tree Multiplier Schematic

Full-adders with a binary '0' as one of the inputs were replaced with half-adders.



Wallce Tree Multiplier Symbol



Wallace Tree Multiplier Test Bench

### Simulations:

All the inputs were defined as with a voltage waveform defined by “vpulse”. Active high voltage was used with binary ‘1’ being represented by voltage level of ‘1v’, and binary ‘0’ being represented by a voltage level of ‘0v’.

Binary ‘1’ was defined by “vpulse” by giving ‘v1’ and v2’ as 0v and 1v. Binary ‘0’ was similarly defined but both ‘v1’ and ‘v2’ were given a voltage level of ‘0v’.

## Combination 1:



## Combination 2:





*Vallace Tree Layout*



*Wallace Tree Multiplier Layout (Zoomed in)*

```

ONE LAYER BOOLEAN: Cumulative Time CPU =      0(s) REAL =      0(s)
TWO LAYER BOOLEAN: Cumulative Time CPU =      0(s) REAL =      0(s)
POLYGON TOPOLOGICAL: Cumulative Time CPU =    0(s) REAL =      0(s)
POLYGON MEASUREMENT: Cumulative Time CPU =    0(s) REAL =      0(s)
SIZE: Cumulative Time CPU =                  0(s) REAL =      0(s)
EDGE TOPOLOGICAL: Cumulative Time CPU =      0(s) REAL =      0(s)
EDGE MEASUREMENT: Cumulative Time CPU =      0(s) REAL =      0(s)
STAMP: Cumulative Time CPU =                  0(s) REAL =      0(s)
ONE LAYER DRC: Cumulative Time CPU =        0(s) REAL =      0(s)
TWO LAYER DRC: Cumulative Time CPU =        0(s) REAL =      0(s)
NET AREA: Cumulative Time CPU =            0(s) REAL =      0(s)
DENSITY: Cumulative Time CPU =            0(s) REAL =      0(s)
MISCELLANEOUS: Cumulative Time CPU =        0(s) REAL =      0(s)
CONNECT: Cumulative Time CPU =            0(s) REAL =      0(s)
DEVICE: Cumulative Time CPU =            0(s) REAL =      0(s)
ERC: Cumulative Time CPU =                0(s) REAL =      0(s)
PATTERN_MATCH: Cumulative Time CPU =      0(s) REAL =      0(s)

Total CPU Time : 1(s)
Total Real Time : 1(s)
Peak Memory Used : 22(M)
Total Original Geometry : 1801(56323)
Total DRC RuleChecks : 572
Total DRC Results : 0 (0)
Summary can be found in file wallace_tree_new.sum
ASCII report database is /tmp/src356/DRC/wallace_tree_new.drc_errors.ascii
Checking in all SoftShare licenses.

Design Rule Check Finished Normally. Thu Dec 3 09:45:43 2015

```

### DRC for Complete Wallace Tree Multiplier

```

Item with name 'PVS Build Date' is already on the list - ignore
Item with name 'PVS Job Time' is already on the list - ignore
Item with name 'Layout GDSII' and path '/tmp/src356/LVS/wallace_tree_new.gds' is al
Item with name 'Rule File' and path '/tmp/src356/LVS/.technology.rul' is already on
Item with name 'Rule File' and path '/nethome/users/src356/GPDK/blackbox.pvl' is al
Item with name 'Run Directory' is already on the list - ignore
Item with name 'PVS Build Date' is already on the list - ignore
Item with name 'PVS Job Time' is already on the list - ignore

Netlist Comparison Finished Normally. Thu Dec 3 09:47:26 2015

Creating cross reference ...
*****
pvs_RCXxref 13.1.1-s016 64 bit (Wed Apr 2 17:59:14 PDT 2014)
Build Ref No.: 016 (04-02-2014)

Copyright 2014 Cadence Design Systems, Inc.
All rights reserved worldwide.

Build O/S: Linux x86_64 2.6.18-194.el5
Executed on: rcl2xece107 (Linux x86_64 3.13.0-70-generic)
Starting Time: Thu Dec 3 09:47:27 2015 (Thu Dec 3 15:47:27 2015 GMT)
With parameters: -format C /tmp/src356/LVS/svdb/wallace_tree_new wallace_tree_new.r
****

Total CPU Time : 0(s)
Total Real Time : 0(s)
Memory Used : 45.14(M)

pvs_RCXxref Done
Checking in all SoftShare licenses.


```

### LVS Check for complete Wallace Tree Multiplier



AV extraction Wallace Tree:

They are parasitic capacitances, which are side effects formed by different layers you used for your layout

[More info on this](#)



## Basic Components:

- And Gate
- OR Gate
- XOR Gate
- Half Adder
- Full Adder

## AND Gate:

1. Transistor level schematic:

Inputs: A,B.

Output: vout



## 2. Layout



### 3.LVS



### 4.DRC



## 5. Symbol



## 6. Testbench Schematic



## 7. Output simulations



| A | B | out |
|---|---|-----|
| 0 | 0 | 0   |
| 0 | 1 | 0   |
| 1 | 0 | 0   |
| 1 | 1 | 1   |

## OR Gate:

### 1. Transistor level schematic

Inputs: A,B

Output:vout



## 2. Layout



### 3. DRC



### 4.LVS



## 5. Symbol



## 6. Testbench Schematic



## 7. Output Simulations



As observed from the graph the output simulation match that of the truth table shown below.

| A | B | X = A+B |
|---|---|---------|
| 0 | 0 | 0       |
| 0 | 1 | 1       |
| 1 | 0 | 1       |
| 1 | 1 | 1       |

## XOR Gate:

Previously implemented schematic/layout of XOR with NAND gates(16 Fets):

### 1. Schematic



### 2. Layout





**Figure 6.44** CMOS transmission gate.

**Solution 3: Transmission Gate Logic.** The most widely-used solution to deal with the voltage-drop problem is the use of *transmission gates*. It builds on the complementary properties of NMOS and PMOS transistors: NMOS devices pass a strong 0 but a weak 1, while PMOS transistors pass a strong 1 but a weak 0. The ideal approach is to use an NMOS to pull-down and a PMOS to pull-up. The transmission gate combines the best of both device flavors by placing a NMOS device in parallel with a PMOS device (Figure 6.44a). The control signals to the transmission gate ( $C$  and  $\bar{C}$ ) are complementary. The transmission gate acts as a bidirectional switch controlled by the gate signal  $C$ . When  $C = 1$ , both MOSFETs are on, allowing the signal to pass through the gate. In short,

$$A = B \quad \text{if} \quad C = 1 \quad (6.26)$$

On the other hand,  $C = 0$  places both transistors in cutoff, creating an open circuit between nodes  $A$  and  $B$ . Figure 6.44b shows a commonly used transmission-gate symbol.

Consider the case of charging node  $B$  to  $V_{DD}$  for the transmission gate circuit in Figure 6.45a. Node  $A$  is driven to  $V_{DD}$  and transmission gate is enabled ( $C = 1$  and  $\bar{C} = 0$ ). If only the NMOS pass-device were present, node  $B$  charges up to  $V_{DD} - V_{Tn}$  at which point the NMOS device turns off. However, since the PMOS device is present and turned on ( $V_{GSp} = -V_{DD}$ ), charging continues all the way up to  $V_{DD}$ . Figure 6.45b shows the opposite case, this is discharging node  $B$  to 0.  $B$  is initially at  $V_{DD}$  when node  $A$  is driven low. The PMOS transistor by itself can only pull down node  $B$  to  $V_{Tp}$  at which point it turns off. The parallel NMOS device however stays turned on (since its  $V_{GS} = V_{DD}$ ) and pulls node  $B$  all the way to  $GND$ . Though the transmission gate requires two transistors and more control signals, it enables rail-to-rail swing.



**Figure 6.45** Transmission gates enable rail-to-rail switching

Another example of the effective use of transmission gates is the popular XOR circuit shown in Figure 6.47. The complete implementation of this gate requires only six transistors (including the inverter used for the generation of  $B$ ), compared to the twelve transistors required for a complementary implementation. To understand the operation of this circuit, we have to analyze the  $B = 0$  and  $B = 1$  cases separately. For  $B = 1$ , transistors  $M_1$  and  $M_2$  act as an inverter while the transmission gate  $M_3/M_4$  is off; hence  $F = AB$ . In the opposite case,  $M_1$  and  $M_2$  are disabled, and the transmission gate is operational, or  $F = \bar{A}\bar{B}$ . The combination of both results in the XOR function. Notice that, regardless of the values of  $A$  and  $B$ , node  $F$  always has a connection to either  $V_{DD}$  or  $GND$  and is hence a low-impedance node. When designing static-pass transistor networks, it is essential to adhere to the low-impedance rule under all circumstances. Other examples where transmission-gate logic is effectively used are fast adder circuits and registers.



**Figure 6.47** Transmission gate XOR.

- Schematic

Previously XOR gate was implemented with NAND gates, the current XOR gate has been implemented with transmission gate logic to optimize for area . Six FETs are used here.



- Layout



- DRC



- LVS



- Symbol



## Test bench Schematic



- Output Simulations



## Half Adder:

- Schematic



- Layout



The screenshot shows the PVS-DSI DRC Debug Environment interface. The top menu bar includes 'Open Errors', 'View By Rule', 'Tools', and 'Top'. A toolbar below has icons for 'File', 'Edit', 'Search', 'Run', 'Stop', 'DRC', 'Report', 'Help', and 'Exit'. The main window displays a report titled '[DRC] DRC'.

The report content is as follows:

```
ONE LAYER BOOLEAN: Cumulative Time CPU = 0(s) REAL = 0(s)
TWO LAYER BOOLEAN: Cumulative Time CPU = 0(s) REAL = 0(s)
POLYGON TOPOLOGICAL: Cumulative Time CPU = 0(s) REAL = 0(s)
POLYGON MEASUREMENT: Cumulative Time CPU = 0(s) REAL = 0(s)
SIZE: Cumulative Time CPU = 0(s) REAL = 0(s)
EDGE TOPOLOGICAL: Cumulative Time CPU = 0(s) REAL = 0(s)
EDGE MEASUREMENT: Cumulative Time CPU = 0(s) REAL = 0(s)
STAMP: Cumulative Time CPU = 0(s) REAL = 0(s)
ONE LAYER DRC: Cumulative Time CPU = 0(s) REAL = 0(s)
TWO LAYER DRC: Cumulative Time CPU = 0(s) REAL = 0(s)
NET AREA: Cumulative Time CPU = 0(s) REAL = 0(s)
DENSITY: Cumulative Time CPU = 0(s) REAL = 0(s)
MISCELLANEOUS: Cumulative Time CPU = 0(s) REAL = 0(s)
CONNECT: Cumulative Time CPU = 0(s) REAL = 0(s)
DEVICE: Cumulative Time CPU = 0(s) REAL = 0(s)
ERC: Cumulative Time CPU = 0(s) REAL = 0(s)
PATTERN_MATCH: Cumulative Time CPU = 0(s) REAL = 0(s)

Total CPU Time : 1(s)
Total Real Time : 1(s)
Peak Memory Used : 18(M)
Total Original Geometry : 380(688)
Total DRC RuleChecks : 572
Total DRC Results : 0 (0)
Summary can be found in file halfadder_new.sum
ASCII report database is /tmp/src356/DRC/halfadder_new.drc_errors.ascii
Checking in all SoftShare licenses.

Design Rule Check Finished Normally. Fri Dec 4 02:36:46 2015
```

At the bottom, there is a search bar with fields for 'Find', 'Next', 'Previous', 'Highlight', 'Matchcase', and 'Whole word'.

- DRC



- LVS

- Symbol



- Test bench schematic



- Output Simulations



As observed from the graph the outputs 'sum' and 'carry' follow the logic function of a half-adder.

## Half Adder



| Inputs |   | Outputs |       |
|--------|---|---------|-------|
| A      | B | Sum     | Carry |
| 0      | 0 | 0       | 0     |
| 0      | 1 | 1       | 0     |
| 1      | 0 | 1       | 0     |
| 1      | 1 | 0       | 1     |

## Full Adder:

- Schematic



- Layout



- DRC



- LVS



- Symbol



- Test bench Schematic



- Output Simulations

As observed from the graph outputs ‘sum’ and ‘carry’ follow the logic function of full-adder.





(a)



(b)

## Full Adder



$$S = A \oplus B \oplus C_{in}$$

$$C_{out} = AB + (A \oplus B) C_{in}$$

| Inputs |   |                 | Outputs |       |
|--------|---|-----------------|---------|-------|
| A      | B | C <sub>in</sub> | Sum     | Carry |
| 0      | 0 | 0               | 0       | 0     |
| 0      | 0 | 1               | 1       | 0     |
| 0      | 1 | 0               | 1       | 0     |
| 0      | 1 | 1               | 0       | 1     |
| 1      | 0 | 0               | 1       | 0     |
| 1      | 0 | 1               | 0       | 1     |
| 1      | 1 | 0               | 0       | 1     |
| 1      | 1 | 1               | 1       | 1     |

