

# VLSI ASSIGNMENT – 3

Ayush Malpani(201530241)

1 a.)

In this isomorphic architecture of FIR filter parallel adders and multipliers are used



Total Area Req - 30 units

Latency – 17 ns

Throughput –  $5.5 \times 10^8$  instructions/ second

Edit code - EDA Playground - Mozilla Firefox

Problem loading page | Edit code - EDA Playgr | EPWave Waveform Vi | YouTube | Quick Start — EDA Pla | + 22:03 ayush

<https://www.edaplayground.com> Search

playground EDA Run Save\*

Brought to you by DOULOS Languages & Libraries Testbench + Design SystemVerilog/Verilog UVM / OVM None Other Libraries None OVL 2.8.1 SVUnit 2.1 Enable TL-Verilog Enable Easier UVM Tools & Simulators Icarus Verilog 0.9.6 Compile & Run Options Wall Run Options Open EPWave after run Download files after run Examples Community Collaborate Forum

SV/Verilog Testbench

```
testbench.sv
1
2
3 module test;
4 reg reset;
5 reg [1:0]x;
6 reg clock;
7 wire [3:0]out;
8 firfilter fil(x,clock,out,reset);
9 always
10 begin
#10 clock <= ~clock;
11 end
12 initial
13 begin
$dumpfile("dump.vcd");
$dumpvars(1);
clock = 1;reset = 1;
#20 reset=0;x= 0 ;
#20 x = 3 ;
#20 x = 2 ;

```

SV/Verilog Design

```
design.sv
1
2
3
4 dff d8 d17,clock,d18,reset);
5 assign m1 = d11>>h1;
6 assign m2 = d12>>h2;
7 adder a1(sum1,m1,m2);
8 assign m3 = d13>>h3;
9 adder a2(sum2,sum1,m3);
10 assign m4 = d14>>h4;
11 adder a3(sum3,sum2,m4);
12 assign m5 = d15>>h5;
13 adder a4(sum4,m5,m6);
14 assign m6 = d16>>h6;
15 adder a5(sum5,sum4,m6);
16 assign m7 = d17>>h7;
17 adder a6(sum6,sum5,m7);
18 assign m8 = d18>>h8;
19 adder a7(outvalue,sum6,m8);
20 //initial begin
21 //monitor("time=%d m1=%d m2=%d m3=%d",$time,d11,d12,d13);
22 //end
23 endmodule

```

Log Share

[2018-02-26 11:30:18 EST]iverilog '-Wall' design.sv testbench.sv && unbuffer vvp a.out

VCD info: dumpfile dump.vcd opened for output.

time= 0 input=x output= 0

time= 20 input=0 output= 0

time= 40 input=2 output= 1

time= 60 input=3 output= 1

time= 80 input=2 output= 1

Finding VCD file... ./dump.vcd

[2018-02-26 11:30:18 EST] Opening EPWave...

Done



1 b.) Iteratively decomposed architecture with only one multiplier and one adder.

In this architecture mux is used to select one of the input and to give the required input to the multiplier and selecting the required multiplicand value from the lookup table.



## Area

for mux = 1 unit

for each multiplier = 2 unit

for each adder = 1 unit

for look up table = 2 unit

for each mod circuit = 1 unit

total area =  $2 + 1 + 1 + 2 + 1$

$$= \underline{\underline{7 \text{ units}}}$$

## Latency

for mux = 1 ns

for multiplier = 3 ns.

for adder = 2 ns.

for rest components = 1 ns.

$$= 8 \times 1 + 8 \times 3 + 8 \times 2$$

(mux)

~~but~~

will

be working

in parallel.

$$+ 8 \times 1$$

( $+ 1 \text{ mod } N + 1$ )

$$= 8 + 24 + 16 + 8$$

$$= 48 \text{ ns.} + 8$$

$$= \underline{\underline{56 \text{ ns}}}$$

## Throughput

for mux = 1 ns

for multiplier = 3 ns

for adder = 2 ns.

for rest components = 1 ns.

$$= 8 \times 1 + 8 \times 3 + 8 \times 2 + 8 \times 1$$

$$= 8 + 24 + 16 + 8$$

$$= \underline{\underline{56 \text{ ns}}}$$

$$\therefore \text{Throughput} = \left( \frac{10^9}{56} \right)$$

$$= \underline{\underline{178 \times 10^5 \text{ instructions/sec}}}$$

1 c.)

(D)(c.)



Area

for each multiplier = 2 units

for each adder = 1 unit

for each d-flip flop = 1 unit

$$\text{total area} = 8 \times 2 + 7 \times 1 + 7 \times 1$$

$$= \underline{\underline{30 \text{ units}}}$$

Latency

for each multiplier = 3 ns

for each adder = 2 ns

$$\text{Rest} = 1 \text{ ns.}$$

Since for 1st output only last multiplier and adder are used.

$$\therefore \text{latency} = 3 \text{ ns} + 2 \text{ ns} \\ = 5 \text{ ns}$$

Throughput

for each ~~instruction~~ calculation

d-flip and multiplier will act in parallel.

∴ for each ~~instruction~~ calculation

one mult. and adder will be used.

∴ 5 ns → 1 ns calculation

$$1 \text{ s} \rightarrow \frac{10^9}{5} = \underline{\underline{(2 \times 10^8)}}$$

$$\text{Area} = 30 \text{ units}$$

$$\text{Latency} = 5 \text{ ns}$$

$$\text{Throughput} = 2 * 10^8 \text{ instructions/second}$$



1 d.)

|            | Isomorphic architecture      | With one multiplier adder     | Pipelined                  |
|------------|------------------------------|-------------------------------|----------------------------|
| Area       | 30 units                     | 7 units                       | 30 units                   |
| Latency    | 17 ns                        | 56 ns                         | 5 ns                       |
| Throughput | $55 * 10^6$ instructions/sec | $178 * 10^5$ instructions/sec | $2 * 10^8$ instruction/sec |

2 a.)

(2) a.)



$$\text{Total area} = 1 + 2 + 1 + 1 + 1 + 1 = 7 \text{ units.}$$

~~Latency~~ latency =  $2ns$  +  $3ns$  +  $2ns$  +  $2ns$  +  $2ns$  +  $2ns$   
(Subtractor) (Multiplier) (Subtractor) (Max) (Adder)  
+ modulus).  
=  $11ns$ .

$$\text{Throughput} = 11 \text{ ns}$$

$$1s = \left(\frac{10^9}{11}\right) = \underline{\underline{9 \times 10^7 \text{ instructions / second}}}$$





AFLA.

for each mult = 2 units.

for each adder = 1 unit

rest components = 1 units.

= 11 units.

latency

20 clock cycles

$$= 70 \text{ units} \times 2$$

$$= 140 \text{ ns}$$

5 units clock cycle because critical path is of 7 units in 2nd stage 3.

Throughput

Assuming unit to be

$$1 \text{ instruction} = 70 \text{ ns}.$$

∴ total throughput

$$= \frac{10^9}{70} = \frac{14}{7} \times 10^8 \text{ instructions/s}$$



2c.)

|            | Pipelined                       | Without Pipelined              |
|------------|---------------------------------|--------------------------------|
| Area       | 11 units                        | 7 units                        |
| Latency    | 14 ns                           | 11 ns                          |
| Throughput | $14 * 10^7$ instructions/second | $9 * 10^7$ instructions/second |

2d.) The area in the first case i.e the non-pipelined case comes out to be 7 units while in the pipelined case it comes out to be 11 units. Therfore the non-pipeline architecture is more efficient in terms of area as compared to the pipeline architecture.

There is an increase in the throughput in the pipelined case.