

# **Tutorial 4**

# **ECE1388 Project 3**

# **4x4 Array Multiplier**

Mustafa Kanchwala

ECE1388

# What is our target?

- Example:

$$\begin{array}{r} 1100 : 12_{10} \\ \underline{0101} : 5_{10} \end{array}$$

multiplicand  
multiplier

---

$$: 60_{10}$$

product

- M x N-bit multiplication

# General Form

- Multiplicand:  $Y = (y_{M-1}, y_{M-2}, \dots, y_1, y_0)$
- Multiplier:  $X = (x_{N-1}, x_{N-2}, \dots, x_1, x_0)$

- Product:  $P = \left( \sum_{j=0}^{M-1} y_j 2^j \right) \left( \sum_{i=0}^{N-1} x_i 2^i \right) = \sum_{i=0}^{N-1} \sum_{j=0}^{M-1} x_i y_j 2^{i+j}$

|          |          |          |          |          |          |          |          |       |       |       |       |
|----------|----------|----------|----------|----------|----------|----------|----------|-------|-------|-------|-------|
|          |          | $y_5$    | $y_4$    | $y_3$    | $y_2$    | $y_1$    | $y_0$    |       |       |       |       |
|          |          | $x_5$    | $x_4$    | $x_3$    | $x_2$    | $x_1$    | $x_0$    |       |       |       |       |
|          |          | $x_0y_5$ | $x_0y_4$ | $x_0y_3$ | $x_0y_2$ | $x_0y_1$ | $x_0y_0$ |       |       |       |       |
|          |          | $x_1y_5$ | $x_1y_4$ | $x_1y_3$ | $x_1y_2$ | $x_1y_1$ | $x_1y_0$ |       |       |       |       |
|          |          | $x_2y_5$ | $x_2y_4$ | $x_2y_3$ | $x_2y_2$ | $x_2y_1$ | $x_2y_0$ |       |       |       |       |
|          |          | $x_3y_5$ | $x_3y_4$ | $x_3y_3$ | $x_3y_2$ | $x_3y_1$ | $x_3y_0$ |       |       |       |       |
|          |          | $x_4y_5$ | $x_4y_4$ | $x_4y_3$ | $x_4y_2$ | $x_4y_1$ | $x_4y_0$ |       |       |       |       |
|          |          | $x_5y_5$ | $x_5y_4$ | $x_5y_3$ | $x_5y_2$ | $x_5y_1$ | $x_5y_0$ |       |       |       |       |
|          |          |          |          |          |          |          |          |       |       |       |       |
| $p_{11}$ | $p_{10}$ | $p_9$    | $p_8$    | $p_7$    | $p_6$    | $p_5$    | $p_4$    | $p_3$ | $p_2$ | $p_1$ | $p_0$ |

multiplicand  
multiplier  
partial products  
product

# Multiplication

- Example:

$$\begin{array}{r} 1100 : 12_{10} \\ 0101 : 5_{10} \\ \hline \end{array}$$

multiplicand  
multiplier  
partial products  
product

- $M \times N$ -bit multiplication

- Produce  $N$   $M$ -bit partial products
  - Sum these to produce  $M+N$ -bit product

# Multiplication

- Example:

$$\begin{array}{r} 1100 : 12_{10} \\ 0101 : 5_{10} \\ \hline \end{array}$$

multiplicand  
multiplier  
partial products  
product

- $M \times N$ -bit multiplication

- Produce  $N$   $M$ -bit partial products
  - Sum these to produce  $M+N$ -bit product

# Multiplication

## □ Example:



## □ $M \times N$ -bit multiplication

- Produce  $N$   $M$ -bit partial products
- Sum these to produce  $M+N$ -bit product

# Multiplication

## □ Example:

$$\sum_{i=0}^{N-1} \sum_{j=0}^{M-1} x_i y_j 2^{i+j}$$

$y_3 x_0 \quad y_2 x_0 \quad y_1 x_0 \quad y_1 x_0$   
 $y_3 x_1 \quad y_2 x_1 \quad y_1 x_1 \quad \color{blue}{y_1} \color{red}{x_1}$

$$\begin{array}{r} 1100 : 12_{10} \\ 0101 : 5_{10} \\ \hline 1100 \\ 0000 \end{array}$$

multiplicand  
multiplier  
partial products  
product

## □ $M \times N$ -bit multiplication

- Produce  $N$   $M$ -bit partial products
- Sum these to produce  $M+N$ -bit product

# Multiplication

- Example:

$$\begin{array}{r} 1100 : 12_{10} \\ 0101 : 5_{10} \\ \hline 1100 \\ 0000 \\ 1100 \\ 0000 \\ \hline \end{array}$$

multiplicand  
multiplier  
partial products  
product

- $M \times N$ -bit multiplication

- Produce  $N$   $M$ -bit partial products
  - Sum these to produce  $M+N$ -bit product

# Multiplication

- Example:

$$\begin{array}{r} 1100 : 12_{10} \\ 0101 : 5_{10} \\ \hline 1100 \\ 0000 \\ 1100 \\ 0000 \\ \hline \end{array}$$

multiplicand  
multiplier  
partial products  
product

- $M \times N$ -bit multiplication

- Produce  $N$   $M$ -bit partial products
  - Sum these to produce  $M+N$ -bit product

# Multiplication

- Example:

$$\begin{array}{r} 1100 : 12_{10} \\ 0101 : 5_{10} \\ \hline 1100 \\ 0000 \\ 1100 \\ 0000 \\ \hline \end{array}$$

multiplicand  
multiplier  
partial products  
product

- $M \times N$ -bit multiplication

- Produce  $N$   $M$ -bit partial products
  - Sum these to produce  $M+N$ -bit product

How do we perform multi-input addition?

# Single-Bit Addition

Half Adder

$$S = A \oplus B$$

$$C_{\text{out}} = A \bullet B$$



| A | B | C <sub>out</sub> | S |
|---|---|------------------|---|
| 0 | 0 | 0                | 0 |
| 0 | 1 | 0                | 1 |
| 1 | 0 | 0                | 1 |
| 1 | 1 | 1                | 0 |

Full Adder

$$S = A \oplus B \oplus C$$

$$C_{\text{out}} = MAJ(A, B, C)$$



| A | B | C | C <sub>out</sub> | S |
|---|---|---|------------------|---|
| 0 | 0 | 0 | 0                | 0 |
| 0 | 0 | 1 | 0                | 1 |
| 0 | 1 | 0 | 0                | 1 |
| 0 | 1 | 1 | 1                | 0 |
| 1 | 0 | 0 | 0                | 1 |
| 1 | 0 | 1 | 1                | 0 |
| 1 | 1 | 0 | 1                | 0 |
| 1 | 1 | 1 | 1                | 1 |

# Carry Propagate Adders

- N-bit adder called CPA
  - Each sum bit depends on all previous carries
  - How do we compute all these carries quickly?



A diagram illustrating the propagation of carries in a 4-bit addition. It shows the addition of two 4-bit binary numbers,  $A_{4...1}$  and  $B_{4...1}$ , to produce a sum  $S_{4...1}$ . The addition is shown as:

$$\begin{array}{r} 1111 \\ + 0000 \\ \hline 1111 \end{array}$$

Blue arrows indicate the flow of carries. The first column from the left has a carry-in of 0 and produces a sum of 1 and a carry-out of 1. The second column has a carry-in of 1 and produces a sum of 1 and a carry-out of 1. The third column has a carry-in of 1 and produces a sum of 1 and a carry-out of 1. The fourth column has a carry-in of 1 and produces a sum of 1 and a carry-out of 0. The carry-out of the fourth column is labeled '0'.

# Multi-input Adders

- Suppose we want to add  $k$   $N$ -bit words
  - Ex:  $0001 + 0111 + 1101 + 0010 = 10111$
- Straightforward solution:  $k-1$   $N$ -input CPAs
  - Large and slow



# Carry Save Addition

- A full adder sums 3 inputs and produces 2 outputs
  - Carry output has twice *weight* of sum output
- N full adders in parallel are called *carry save adder*
  - Produce N sums and N carry outs



# Array Multiplier



# Design

- Using Verilog, describe the digital logic of an unsigned array multiplier to calculate the unsigned product of two unsigned 4-bit numbers. Provide the full Verilog code in your report.



# Simulation

- ❑ You have been provided with an exhaustive Verilog test bench to verify the functionality of your 4x4 multiplier.
- ❑ You can use NCVerilog (or Modelsim) to run the test bench with the following Linux command:  
`ncverilog multiplier.v testbench.v` (for NCVerilog)
  - Follow guidance in VLSI User Manual for Modelsim simulation
- ❑ You must include `timescale 1ns/1ps before your multiplier module declaration.
- ❑ Provide the output of the test bench dialogue in your report to show the functionality of your multiplier.

# **Synthesis**

**Hier.1**

Logical Hierarchy

- SortUnitStructRTL\_8nbits
  - val\_S0S1
  - elm\_S0S1\$003
  - elm\_S0S1\$002
  - elm\_S0S1\$001
  - elm\_S0S1\$000**
  - minmax0\_S1
  - minmax1\_S1
  - val\_S1S2
  - elm\_S1S2\$003
  - elm\_S1S2\$002
  - elm\_S1S2\$001
  - elm\_S1S2\$000
  - minmax0\_S2

**Schematic.1**

**HistList.1 - HistList.1 Path S...**

**Path Slack**

| Slack | Number of Paths |
|-------|-----------------|
| 0.002 | 7               |
| 0.003 | 7               |
| 0.004 | 16              |
| 0.004 | 10              |

Worst: 0.00180304 Best: 0.00439787

**Schematic.2**

**Hier.1**   **Schematic.1**   **HistList.1**   **Schematic.2**

Ready Cell elm\_S0S1\$000

# 1

- ❑ Place the synthesis.tcl file in the same working directory as the Verilog test bench and your multiplier design.
- ❑ This TCL script will be launched using the Synopsys Design Compiler tool to synthesize your multiplier using the digital standard cells in the TSMC 65nm library.
- ❑ Run the Design Compiler GUI with the following Linux command: `design_vision`

# 2

- ❑ Perform the synthesis of your Verilog multiplier design by executing the synthesis script in Design Vision: File > Execute Script
- ❑ The compilation (compile\_ultra) will take a few minutes to complete.
- ❑ Once complete, you will be able to view the gate-level schematic of your synthesized multiplier by right-clicking on the multiplier design in the Logical Hierarchy window and selecting schematic view.
- ❑ Save a copy of the schematic:
  - File > Print > Print to File (PDF).

# 3

- ❑ The Synopsys Design Compiler tool will output the following files into the working directory:
  - Synthesized Verilog netlist: multiplier\_syn.v
  - Area constraint file: multiplier\_syn.sdc
  - Timing and Area reports
- ❑ In your report, provide the generated estimates for total area and critical path timing.
- ❑ If you were to clock the design, what would be the maximum clock frequency?
- ❑ In your report, include the schematic of the synthesized multiplier, and highlight the critical path based on the timing report.

# 4

- ❑ Verify that your synthesis was successful by running the multiplier test bench with the newly-synthesized netlist and the TSMC 65nm standard cell Verilog technology file:  
`ncverilog testbench.v multiplier_syn.v tcbn65gplus.v`
- ❑ Prior to running NCVerilog, ensure that all the module names in the generated Verilog files match.
- ❑ Provide the output of the NCVerilog test bench dialogue in your report.

# **Physical Implementation**

# Get Ahead of the Curve

with Innovus Implementation System



# 1

- ❑ Place the innovus.globals and multiplier.view file in the same working directory as the multiplier\_syn.v and multiplier\_syn.sdc files. Run the Cadence INNOVUS place & route tool with the following command: innovus
- ❑ Set up the design: File > Import Design > Load > innovus.globals
- ❑ This will load the appropriate timing and physical cell libraries for the place & route.

# 1

- ❑ Place the innovus global and multiplier.view file in the same folder and run the multiplier\_syn.vadence command
- ❑ Set up the innovus libraries
- ❑ This will load physical cell libraries



# 2

- Define Floorplan
- Use Edit > Pin Editor
  - Place A[] on left
  - Place B[] on right
  - Place product[] on top



# 2

- Place power stripes for VDD and VSS:
  - Power > Power Planning > Add Stripe
  - Select the M3 Layer with: Width: 1 and Spacing: 1
  - Select the M4 Layer with: Width:1 and Spacing: 1
  - Press Apply and OK.
  - Set-to-set distance = 4
- You have now set up the power connections for the VDD and VSS rails.



- Place and Verify
- Power > Area
- Select Width
- Set Selection Width
- Pre
- You have power VDD and VSS



# 3

- Place the standard cells:
  - Place > Place Standard Cell > OK
- Once complete, you can view the placed cells:
  - Place > Display > Display Spare Cell
- Verify that the design does not have any geometry errors:
  - Verify > Verify Geometry > OK



# 4

- Route the power connections:
  - Route > Special Route
- You should now see VDD and VSS rails of standard cells connected to the VDD and VSS stripes
- Route the signal connections:
  - Route > NanoRoute
- Verify > Verify Geometry



- Route connection – Router
- You see VDD standard the V
- Route – Router
- Verify



# 5

- ❑ Add filler cells to fill the unused area of your design:
  - Place > Physical Cell > Add Filler
  - Under Cell Name(s), select all the available filler cells from the cell list.
- ❑ Verify that the design does not have any geometry errors:
  - Verify > Verify Geometry > OK

## □ A cool feature to see congestion



# 6

- ❑ Perform a power analysis:
  - Power > Power Analysis > Run
- ❑ Perform static power analysis for a constant input activity factor of 0.2, over the frequency range of 100MHz to 1GHz.
- ❑ Plot your results for total internal power and total switching power vs. frequency on the same graph.
- ❑ Comment on these results with respect to the synthesized timing results obtained in Part 2.2.
- ❑ What do the power analysis estimates suggest is the maximum clock frequency?

# 7

- ❑ Obtain the total area of your placed & routed design:  
File > Report > Summary  
Compare the resulting total core area after place & route with the estimated synthesis area from Part 2.2. Comment on the discrepancy.

# 8

- Export the Verilog netlist of the placed & route design: File > Save > Netlist
- Save the file as multiplier\_netlist.v, and verify that your synthesis was successful by running the multiplier test bench with the newly-synthesized netlist and the TSMC 65nm standard cell Verilog technology file:  
`ncverilog testbench.v multiplier_netlist.v  
tcbn65gplus.v`
- Provide the output of the NCVerilog test bench dialogue in your report.

# Some Tips



**END**