

# 4-bit Carry Look Ahead Adder Design

## VLSI Design Course Project

Aanchal Amit Mundhada

2023112016

Electronics and Communication Dual Degree

International Institute of Information Technology, Hyderabad

[aanchal.mundhada@research.iiit.ac.in](mailto:aanchal.mundhada@research.iiit.ac.in)

**Abstract —** This report presents a design of 4-bit carry look ahead adder which has inputs taken from D flip flop and outputs also pass through the D flip flop.

**Keywords—** adder, carry look ahead adder, D flip flop, sum block, CLA, Propagate Generate

### I. INTRODUCTION

The adder in this project is designed using basic digital logic components like AND gates, OR gates, and inverters, which are combined to form full adders. Each full adder calculates the sum and carry for a pair of input bits. The adder follows a ripple-carry structure, where the carry output from one bit is passed to the next, starting from the least significant bit to the most significant bit. If the numbers to be added are a<sub>3</sub>a<sub>2</sub>a<sub>1</sub>a<sub>0</sub> and b<sub>3</sub>b<sub>2</sub>b<sub>1</sub>b<sub>0</sub>, then the propagate (pi) and generate (gi) signals for each bit position can be defined as,

$$pi = ai \oplus bi$$

$$gi = ai \cdot bi$$

and the carry out ( $c(i+1)$ ) of the i<sup>th</sup> bit position can be written as,

$$c(i+1) = (pi \cdot ci) + gi, i = 0, 1, 2, 3$$

Thus,  $c(i+1)$  can be expressed entirely in terms of the  $pi$  and  $gi$  functions and sum can be represented as follows,

$$sum_i = pi \oplus ci$$

TSPC D flip-flops are included to manage timing and hold intermediate values, ensuring the circuit operates reliably in clocked systems. This modular design allows for easy expansion to handle larger bit-widths, making it adaptable for various applications.



Fig. 1. Adder

### II. DESIGN DETAILS

The circuit is made by combining logic gates AND, OR and XOR. It also uses inverters and D flip flops.

The propagate generate block takes input  $a_i$  and  $b_i$  from D flip flop and using 2 input AND gate we calculate  $g_i$  as  $g_i = a_i \cdot b_i$ . On using 2 input XOR gate we calculate  $p_i$  as  $p_i = a_i \oplus b_i$ .

For carry look ahead block the inputs are  $p_i$ ,  $g_i$  and  $c_0$ . We calculate  $c_1$ ,  $c_2$ ,  $c_3$  and  $c_4$ .

$c_1 = g_0 + p_0 c_0$ , here we use 2 input AND gate for  $p_0 c_0$  and then 2 input OR gate to combine  $g_0$  and  $p_0 c_0$  to calculate our  $c_1$ .

$c_2 = g_1 + p_1 g_0 + p_1 p_0 c_0$ , here we use 3 input AND gate for  $p_1 p_0 c_0$  and 2 input AND gate for  $p_1 g_0$ , then 3 input OR gate to combine  $g_1$ ,  $p_1 g_0$  and  $p_1 p_0 c_0$  to calculate our  $c_2$ .

$c_3 = g_2 + p_2 g_1 + p_2 p_1 g_0 + p_2 p_1 p_0 c_0$ , here we use 4 input AND gate for  $p_2 p_1 p_0 c_0$ , 3 input AND gate for  $p_2 p_1 g_0$  and 2 input AND gate for  $p_2 g_1$ , then 4 input OR gate to combine  $g_2$ ,  $p_2 g_1$ ,  $p_2 p_1 g_0$  and  $p_2 p_1 p_0 c_0$  to calculate our  $c_3$ .

$c_4 = g_3 + p_3 g_2 + p_3 p_2 g_1 + p_3 p_2 p_1 g_0 + p_3 p_2 p_1 p_0 c_0$ , here we use 5 input AND gate for  $p_3 p_2 p_1 p_0 c_0$ , 4 input AND gate for  $p_3 p_2 p_1 g_0$ , 3 input AND gate for  $p_3 p_2 g_1$  and 2 input AND gate for  $p_3 g_2$ , then 5 input OR gate to combine  $g_3$ ,  $p_3 g_2$ ,  $p_3 p_2 g_1$ ,  $p_3 p_2 p_1 g_0$  and  $p_3 p_2 p_1 p_0 c_0$  to calculate our  $c_4$ .

(Note: It is given that  $c_0$  is zero, so we don't really need 5 input AND and OR gates. I have used % input gates in NGSim but not in MAGIC)

For sum block, we use 2 input XOR gate with inputs  $c_i$  and  $p_i$  calculated from previous blocks as  $sum_i = p_i \oplus c_i$ .

We get outputs  $s_0$ ,  $s_1$ ,  $s_2$ ,  $s_3$  and  $c_4$  which is passed through D flip flop.

#### A. Gates

For all the gates we use CMOS Topology. Their circuit diagram along with sizing are given below.

##### i. AND gates

Sizing:  $W_p$  and  $NW_n$ ,  $N$  is number of inputs.



2 - Input AND Gate

(a)



3- Input AND Gate

(b)



4- Input AND Gate

(c)

Fig. 2. (a)2 input AND (b) 3 input AND (c)4 input AND

##### ii. OR gates

Sizing:  $NW_p$  and  $W_n$ ,  $N$  is number of inputs.



2 - Input OR Gate

(a)



3 - Input OR Gate

(b)



4 - Input OR Gate

(c)

Fig. 3. (a)2 input OR (b) 3 input OR (c)4 input OR

### iii. XOR gate

XOR gate made by using 2 inverters, 2 AND gates and 1 OR gate as shown in Fig. 4



XOR Gate

Fig. 4. 2 input XOR gate

### B. TSPC flip flop



TSPC Flip Flop

Fig. 5. TSPC flip flop

## III. PRE-LAYOUT SIMULATION IN NGSPICE

### A. CLA adder

#### i. Propagate and generate

For, A=1110 and B=1011 we get following outputs,

p0=1, p1=0, p2=1, p3=0 and g0=0, g1=1, g2=0, g3=1



(a)



(b)

Fig. 6. (a) Propagate (b) Generate

## ii. Carry look ahead

For,  $p0=0$ ,  $p1=1$ ,  $p2=1$ ,  $p3=1$  and  $g0=1$ ,  $g1=1$ ,  $g2=0$ ,  $g3=1$ ,  $c0=0$  we get the following outputs,

$c1=1$ ,  $c2=1$ ,  $c3=1$ ,  $c4=1$



Fig. 7. Carry look ahead block

## iii. Sum block

For  $c0=0$ ,  $c1=1$ ,  $c2=1$ ,  $c3=1$  and  $p0=1$ ,  $p1=1$ ,  $p2=0$ ,  $p3=1$  we get following outputs,

$s0=1$ ,  $s1=0$ ,  $s2=1$ ,  $s3=0$



Fig. 8. Sum block

## iv. Complete adder

For  $A=1110$  and  $B=1011$  we get sum= $1001$  and cout= $1$



Fig. 9. Adder (pre layout)

We get tpd\_max=0.72ns for s3.

```
tpd_rise      = 7.116351e-10 targ= 7.616351e-10 trig= 5.000000e-11
tpd_fall      = 7.343098e-10 targ= 1.088431e-08 trig= 1.015000e-08
tpd_max       = 7.22972e-10
```

Fig. 10. tpd\_max (pre layout)

#### B. D flip flop

As we can see, X is updated to A at positive edge only.



Fig. 11. D flip flop

#### IV. SETUP TIME, HOLD TIME AND PROPAGATION DELAY FROM CLK TO Q IN FLIPFLOP

We have used TSPC topology for our D-FlipFlop. Since there is no use of clk', we have zero hold time (thold = 0).

For setup time, we did trial and error and found it to be tsetup = 0.1019ns. Any input given less than tsetup before next +ve edge, either input is not registered or output is being corrupted.

Propagation delay from clk to Q will be 0.087ns.

```
tpd_rise      = 5.761607e-11 targ= 2.010762e-08 trig= 2.005000e-08
tpd_fall      = 1.167702e-10 targ= 4.016677e-08 trig= 4.005000e-08
total_prop_delay = 8.71931e-11
```

Fig. 12. Total propagation delay (pre layout)

#### V. STICK DIAGRAM OF ALL UNIQUE GATES AND FLIPFLOP



Fig. 13. Inverter stick diagram



Fig. 14. 2 input AND gate stick diagram



Fig. 15. 3 input AND gate stick diagram



Fig. 18. 3 input OR gate stick diagram



Fig. 16. 4 input AND gate stick diagram



Fig. 19. 4 input OR gate stick diagram



Fig. 17. 2 input OR gate stick diagram



Fig. 20. TSPC flip flop stick diagram

## VI. POST LAYOUT SIMULATION FOR MAGIC

### A. CLA adder

#### i. Propagate and generate



Fig. 21. Propagate Generate MAGIC layout

For, A=1110 and B=1011 we get following outputs,

$p_0=1, p_1=0, p_2=1, p_3=0$  and  $g_0=0, g_1=1, g_2=0, g_3=1$



Fig. 22. (a)Post layout propagate (b)Post layout generate

#### ii. Carry look ahead



Fig. 23. Carry look ahead MAGIC layout

For,  $p_0=0$ ,  $p_1=1$ ,  $p_2=1$ ,  $p_3=1$  and  $g_0=1$ ,  $g_1=1$ ,  $g_2=0$ ,  $g_3=1$ ,  $c_0=0$  we get the following outputs,

$c_1=1$ ,  $c_2=1$ ,  $c_3=1$ ,  $c_4=1$



Fig. 24. Post layout carry look ahead block

### iii. Sum block



Fig. 25. Sum block MAGIC layout

For  $c_0=0$ ,  $c_1=1$ ,  $c_2=1$ ,  $c_3=1$  and  $p_0=1$ ,  $p_1=1$ ,  $p_2=0$ ,  $p_3=1$  we get following outputs,

$s_0=1$ ,  $s_1=0$ ,  $s_2=1$ ,  $s_3=0$



Fig. 26. Post layout sum block

#### iv. Complete adder



Fig. 27. Adder MAGIC layout

For A=1110 and B=1011 we get sum=1001 and cout=1



Fig. 28. Post layout adder

We get tpd\_max=0.82ns for s3.

```
tpd_rise      = 8.036613e-10 targ= 8.536613e-10 trig= 5.000000e-11
tpd_fall     = 8.453609e-10 targ= 1.099536e-08 trig= 1.015000e-08
tpd_max      = 8.24511e-10
```

Fig. 29. tpd\_max (post layout)

#### B. D flip flop



Fig. 30. D flip flop MAGIC layout

As we can see, X is updated to A at positive edge only.



Fig. 31. Post layout D flip flop

Total propagation delay will be 0.10ns.

```
tpd_rise      = 7.060231e-11 targ= 2.012060e-08 trig= 2.005000e-08
tpd_fall     = 1.346496e-10 targ= 4.018465e-08 trig= 4.005000e-08
total_prop_delay = 1.02626e-10
```

Fig. 32. Total propagation delay (post layout)

|                                   | Pre-layout | Post-layout |
|-----------------------------------|------------|-------------|
| Propagation delay for D flip flop | 0.0872ns   | 0.1026ns    |
| tpd_max for adder                 | 0.723ns    | 0.8245ns    |

Table 1. Propagation delay and tpd\_max comparison

## VII. SIMULATIONS OF FINAL CIRCUIT IN NGSPICE

We combine the CLA-Adder with D-Flipflops to give input at positive edge and get output at second positive edge.

For A=1110 and B=1011 we get sum as 1001 and cout as 1 at the second positive edge of the clock. This means our circuit is working correctly.



Fig. 33. Final circuit (pre layout)

Worst case delay seen in pre-layout simulation is 0.723ns.

For maximum clock frequency we find minimum clock time.

$$t_{clk\_min} = t_{hold} + t_{pd\_max} + t_{pcq\_max}$$

$$t_{clk\_min} = 0 + 0.72\text{ns} + 0.087\text{ns} = 0.807\text{ns}$$

Hence,  $f_{max} = 1239.16 \text{ MHz}$ .

Maximum clock frequency at which our circuit operates correctly is 1239.16 MHz.

## VIII. FLOOR PLAN AND MAGIC LAYOUT FOR FINAL CIRCUIT



Fig. 34. Final circuit MAGIC layout

This is the complete floor plan of final circuit with Horizontal Pitch 167.13 $\mu\text{m}$  and Vertical Pitch 97.65 $\mu\text{m}$  with area 16320( $\mu\text{m}$ )<sup>2</sup>.

## IX. POST LAYOUT SIMULATIONS OF FINAL CIRCUIT

For A=1110 and B=1011 we get sum as 1001 and cout as 1 at the second positive edge of the clock. It matches with the pre-layout simulation.



Fig. 35. Final circuit (post layout)

## X. DELAY OF CLA ADDER AND MAXIMUM CLOCK FREQUENCY

Worst case delay seen in pre-layout simulation is 0.8245ns.

For maximum clock frequency we find minimum clock time.

$$tclk\_min = thold + tpd\_max + tpcq\_max$$

$$tclk\_min = 0 + 0.82\text{ns} + 0.103\text{ns} = 0.823\text{ns}$$

Hence,  $f_{max} = 1215.07 \text{ MHz}$ .

Maximum clock frequency at which our circuit operates correctly is 1215.07 MHz.

|          | Pre-layout  | Post-layout |
|----------|-------------|-------------|
| tclk_min | 0.807ns     | 0.823ns     |
| fclk_max | 1239.16 MHz | 1215.07 MHz |

Table 2. tclk\_min and fclk\_max comparison

Post layout time is more than pre layout time.

## XI. VERILOG SIMULATIONS AND WAVEFORMS

This 4-bit carry-lookahead adder (CLA) consists of three main components: D flip-flops, a carry-lookahead adder, and a top-level module integrating the two. The D flip-flops are used to store input and output values, ensuring that data is synchronized with the clock signal. Each input bit of the 4-bit numbers (A and B) and the carry-in (Cin) is stored in separate flip-flops before being passed to the adder. Similarly, the output bits of the adder, including the sum (S[3:0]) and carry-out (Cout), are passed through another set of flip-flops to stabilize the output and synchronize it with the clock.

The core arithmetic operation is performed by the carry-lookahead adder. It computes the propagate and generate signals for each bit and uses these to calculate the carry signals hierarchically. The carry-lookahead logic allows faster addition by directly computing carries rather than waiting for sequential propagation. The use of D flip-flops at both input and output ensures proper timing synchronization, making the circuit suitable for clocked or pipelined systems.

### A. CLA adder

```
time= 5   rst=0  clk=1  a0=0  a1=0  a2=1  a3=0  b0=1  b1=0  b2=0  b3=0  cin=0  s0=1  s1=0  s2=1  s3=0  cout=0
time= 6   rst=0  clk=0  a0=0  a1=0  a2=1  a3=0  b0=1  b1=0  b2=0  b3=0  cin=0  s0=0  s1=1  s2=1  s3=0  cout=0
time= 8   rst=0  clk=0  a0=0  a1=0  a2=0  a3=1  b0=1  b1=1  b2=0  b3=0  cin=0  s0=0  s1=1  s2=1  s3=0  cout=0
time= 10  rst=0  clk=0  a0=0  a1=0  a2=0  a3=1  b0=1  b1=1  b2=0  b3=0  cin=0  s0=0  s1=1  s2=1  s3=0  cout=0
time= 12  rst=0  clk=0  a0=0  a1=0  a2=0  a3=1  b0=1  b1=1  b2=0  b3=0  cin=0  s0=0  s1=1  s2=1  s3=1  cout=0
time= 13  rst=0  clk=0  a0=0  a1=0  a2=0  a3=1  b0=1  b1=1  b2=0  b3=0  cin=0  s0=0  s1=1  s2=1  s3=1  cout=0
time= 15  rst=0  clk=0  a0=0  a1=0  a2=1  a3=1  b0=1  b1=0  b2=1  b3=1  cin=0  s0=0  s1=1  s2=1  s3=1  cout=0
time= 18  rst=0  clk=0  a0=0  a1=0  a2=1  a3=1  b0=1  b1=0  b2=1  b3=1  cin=0  s0=0  s1=1  s2=0  s3=1  cout=1
time= 20  rst=0  clk=0  a0=0  a1=0  a2=1  a3=0  b0=0  b1=1  b2=0  b3=0  cin=0  s0=0  s1=1  s2=1  s3=0  cout=0
time= 21  rst=0  clk=0  a0=0  a1=0  a2=1  a3=0  b0=0  b1=1  b2=0  b3=0  cin=0  s0=0  s1=1  s2=1  s3=0  cout=0
time= 24  rst=0  clk=0  a0=1  a1=0  a2=1  a3=0  b0=0  b1=1  b2=0  b3=0  cin=0  s0=1  s1=1  s2=1  s3=0  cout=0
time= 25  rst=0  clk=0  a0=0  a1=0  a2=0  a3=0  b0=1  b1=0  b2=1  b3=0  cin=0  s0=0  s1=1  s2=1  s3=1  cout=0
time= 27  rst=0  clk=0  a0=0  a1=0  a2=0  a3=0  b0=1  b1=0  b2=1  b3=0  cin=0  s0=0  s1=1  s2=0  s3=0  cout=0
time= 28  rst=0  clk=0  a0=0  a1=0  a2=0  a3=0  b0=1  b1=0  b2=1  b3=0  cin=0  s0=0  s1=1  s2=0  s3=0  cout=0
time= 33  rst=0  clk=1  a0=0  a1=1  a2=1  a3=0  b0=1  b1=0  b2=1  b3=0  cin=0  s0=1  s1=1  s2=0  s3=0  cout=0
time= 34  rst=0  clk=1  a0=0  a1=1  a2=1  a3=0  b0=1  b1=0  b2=1  b3=0  cin=0  s0=1  s1=1  s2=0  s3=0  cout=0
time= 35  rst=0  clk=1  a0=0  a1=1  a2=1  a3=0  b0=1  b1=0  b2=1  b3=0  cin=0  s0=1  s1=1  s2=0  s3=0  cout=0
time= 36  rst=0  clk=1  a0=0  a1=1  a2=1  a3=0  b0=1  b1=0  b2=1  b3=0  cin=0  s0=1  s1=1  s2=0  s3=0  cout=0
time= 39  rst=0  clk=1  a0=0  a1=0  a2=1  a3=1  b0=0  b1=0  b2=1  b3=1  cin=0  s0=0  s1=1  s2=0  s3=1  cout=1
time= 40  rst=0  clk=1  a0=0  a1=0  a2=1  a3=1  b0=0  b1=0  b2=1  b3=1  cin=0  s0=0  s1=1  s2=0  s3=1  cout=1
time= 43  rst=0  clk=1  a0=0  a1=0  a2=1  a3=0  b0=0  b1=1  b2=0  b3=1  cin=0  s0=0  s1=1  s2=1  s3=0  cout=0
time= 45  rst=0  clk=1  a0=0  a1=0  a2=1  a3=0  b0=0  b1=1  b2=0  b3=1  cin=0  s0=0  s1=1  s2=1  s3=1  cout=0
```

Fig. 36. Adder verilog simulation



Fig. 37. Adder gtk waveform

### B. D flip flop

For verilog simulation we take D flip flop and not TSPC flip flop.

```
time=          0   rst=1  clk=0  D=0  Q=0
time=          5   rst=1  clk=1  D=0  Q=0
time=         10   rst=0  clk=0  D=0  Q=0
time=         15   rst=0  clk=1  D=0  Q=0
time=         20   rst=0  clk=0  D=1  Q=0
time=         25   rst=0  clk=1  D=1  Q=1
time=         30   rst=0  clk=0  D=0  Q=1
time=         35   rst=0  clk=1  D=0  Q=0
time=         40   rst=0  clk=0  D=1  Q=0
time=         45   rst=0  clk=1  D=1  Q=1
time=         50   rst=0  clk=0  D=1  Q=1
time=         55   rst=0  clk=1  D=1  Q=1
time=         60   rst=0  clk=0  D=0  Q=1
time=         65   rst=0  clk=1  D=0  Q=0
time=         70   rst=0  clk=0  D=0  Q=0
```

Fig. 38. D flip flop verilog simulation



Fig. 39. D flip flop gtk waveform

### C. Circuit

```
time=          15  rst=0  clk=1  A=1010  B=0101  Cin=0  S=0000  Cout=0  DFF_A=1010  DFF_B=0101  DFF_S=1111  DFF_Cin=0
time=          20  rst=0  clk=0  A=1100  B=0011  Cin=0  S=0000  Cout=0  DFF_A=1010  DFF_B=0010  DFF_S=1111  DFF_Cin=0
DFF_Cout=0
time=          25  rst=0  clk=1  A=1100  B=0011  Cin=0  S=1111  Cout=0  DFF_A=1100  DFF_B=0011  DFF_S=1111  DFF_Cin=0
DFF_Cout=0
time=          30  rst=0  clk=0  A=0001  B=0110  Cin=0  S=1111  Cout=0  DFF_A=1100  DFF_B=0011  DFF_S=1111  DFF_Cin=0
DFF_Cout=0
time=          35  rst=0  clk=1  A=0001  B=0110  Cin=0  S=1111  Cout=0  DFF_A=0001  DFF_B=0010  DFF_S=0011  DFF_Cin=0
DFF_Cout=0
time=          40  rst=0  clk=0  A=1111  B=1111  Cin=0  S=1111  Cout=0  DFF_A=0001  DFF_B=0010  DFF_S=0011  DFF_Cin=0
DFF_Cout=1
time=          45  rst=0  clk=1  A=1111  B=1111  Cin=0  S=0011  Cout=0  DFF_A=1111  DFF_B=1111  DFF_S=1110  DFF_Cin=0
DFF_Cout=1
time=          50  rst=0  clk=0  A=0000  B=0000  Cin=0  S=0011  Cout=0  DFF_A=1111  DFF_B=1111  DFF_S=1110  DFF_Cin=0
DFF_Cout=0
time=          55  rst=0  clk=1  A=0000  B=0000  Cin=0  S=1110  Cout=1  DFF_A=0000  DFF_B=0000  DFF_S=0000  DFF_Cin=0
```

Fig. 40. Circuit verilog simulation



Fig. 41. Circuit gtk waveform

## XII. FPGA AND OSCILLOSCOPE WAVEFORM

We take inputs as  $A=1110$  and  $B=1011$  and on using FPGA and oscilloscope we get carry as 1,  $s_3=1$ ,  $s_2=0$ ,  $s_1=0$  and  $s_0=1$  which is our desired output. Hence our waveforms below are correct.



Fig. 42. Carry = 1



Fig. 43.  $s_3 = 1$  (yellow) and  $s_2 = 0$  (green)



Fig. 44.  $s_1 = 0$  (yellow) and  $s_0 = 1$  (green)



Fig. 45. Boolean board



Fig. 46. Setup of oscilloscope

## ACKNOWLEDGMENT

I extend my sincere gratitude Prof. Abhishek Srivastava for entrusting me with this remarkable project, which offered an invaluable opportunity for learning. Furthermore, I appreciate the invaluable guidance provided by the TAs throughout the duration of this project.

## REFERENCES

- [1] Digital logic and computer design by Morris mano.
- [2] Class Notes and Tutorial slides.

