



# Digital Design Project Report

## ECE 212s

### Team 19

| Name                                | ID      |
|-------------------------------------|---------|
| <b>Mohamed Abdou Ahmed</b>          | 2200206 |
| <b>Yousef Abd El-Fattah Mahmoud</b> | 2200260 |
| <b>Ahmed Mahmoud Fathy</b>          | 2200285 |
| <b>Marwan Abdallah Amin</b>         | 2200560 |
| <b>Moaaz Tarek Mohamed</b>          | 2200092 |
| <b>Mohamed Mahmoud Ibrahim</b>      | 2200456 |
| <b>Omar Ahmed Gamal</b>             | 2200196 |
| <b>Shihab Mohamed</b>               | 2201366 |
| <b>Karim Hosam Ahmed</b>            | 2201405 |

## Table of contributions

|                                          |                                                                             |
|------------------------------------------|-----------------------------------------------------------------------------|
| Implementation of base fundamental gates | Ahmed Mahmoud-Youssef Abdelfattah<br>Mohamed Abdou- Mohamed Mahmoud         |
| Logical operations modules               | Mohamed Abdou- Omar Ahmed                                                   |
| Arithmetic operations modules            | Marwan Abdallah- Mohamed Mahmoud - Yousef Abdelfattah                       |
| Delay Calculations                       | Shihab Mohamed- Moaaz Tarek<br>Mohamed Mahmoud                              |
| Power Calculations                       | Karim Hosam - Marwan Abdallah                                               |
| ALU Assembly                             | Mohamed Abdou- Marwan Abdallah-<br>Ahmed Mahmoud                            |
| Test-Benches                             | Ahmed Mahmoud- Yousef Abdelfattah                                           |
| Verilog                                  | Yousef Abdelfattah Mahmoud                                                  |
| Implementation of sequential circuit     | Moaaz Tarek- Ahmed Mahmoud-<br>Marwan Abdallah                              |
| Report                                   | Karim Hosam - Shihab Mohamed-<br>Omar Ahmed - Mohamed Abdou-<br>Moaaz Tarek |

Undersupervision of ENG. Amr Hamouda

# Introduction

This project report documents the design and implementation of a custom Arithmetic and Logical Unit (ALU). The primary objective of the project was to design an efficient ALU capable of performing a variety of arithmetic and logical operations using both HDL and transistor-level design.

## Project Description

The ALU designed in this project supports both arithmetic and logical operations based on a 4-bit control signal ( $\text{sel}<3:0>$ ), with the most significant bit determining the operation category (arithmetic or logic), and the remaining bits selecting the specific operation. The unit operates on two signed 4-bit inputs ( $a<3:0>$  and  $b<3:0>$ ) and outputs an 8-bit result.

## Implemented Features:

1. Arithmetic Operations: Increment, Decrement, Transfer, Add, Subtract, multiply.
2. Logical Operations: AND, OR ,XOR, XNOR, NAND, NOR, and 1's Complement.
3. VHDL/Verilog Code: Complete HDL implementation of the ALU and flip-flops.
4. Testbench: Comprehensive verification covering all operations.
5. Transistor-Level Design: Schematic and simulation using Cadence tools with 130nm CMOS technology.

6. Performance Metrics: Delay and power consumption measured at worst-case and maximum-frequency conditions.
7. Flip-Flops: Designed at both input and output stages for synchronization.
8. Simulation: Transient analysis confirming correct sampling at maximum operational frequency.

## Links:

CADENCE Files:

<https://drive.google.com/drive/u/0/folders/1yvG5f3DQXBb9tfuMXrkIo69znCAhpda>

## Inverter



We sized the inverter to be symmetric inverter as we used the ratio  $W_p=2W_n$

## Symbol



## Inverter testbench



# AND

## schematic



we sized each gate according to the reference inverter we made at first

## symbol



## Test Bench



## Wave Form



# NOR

## Schematic



## Symbol



## Test Bench



## Waveform



# OR schematic



## symbol



## Test Bench



## Wave Form



# XOR

## schematic



We used PTL (pass-Transistor Logic) because it's so much easier than C-mos implementation and uses less transistors and we solved the voltage drop problem by adding keeper transistor and we made its size so small because it's ratioed

## Symbol



## Test Bench



## Wave Form



## TGL Mux 2-1

### Schematic



We used TGL technology because it is the most famous and easiest family for implementation of MUX

### Symbol



## Tgl\_2in\_Mux testbench



## TGL Mux 16-1

### schematic



We cascaded 2-1 MUX to create the 16-1 Mux to reduce complications

### Symbol



## TGL Mux 16-1-TB



# Full Adder

## schematic



## Symbol



## Full\_Adder-TB



## 4bit Adder

### schematic



solved the overflow by x-oring last carry in with last carry out to implement overflow flag and we x-or the overflow with the last sum as and extended the output as it is the right bit

### Symbol



## 4bit\_Adder-TB



As shown  $a=6$ ,  $b=-6$ ,  $out \rightarrow 0$



## PTL OR

### schematic



### Symbol





## Sub



Same here, the overflow we put 2 XORs to solve the overflow and the extra here that we put B bar and put the first carry in equal 1 to generate the two's complement of B

## Symbol





## A Complement Module



## Symbol



## B Complement Module



## Symbol



## Decrement A-Module



## Symbol



## Increment A-Module



## Symbol



## NAND Module



symbol



## NOR Module



symbol



## XOR Module



## Symbol



## XNOR Module



## Symbol



## Mul



## TEST-Bench



## Mul Symbol



## Logical unit of the ALU



## Arithmetic unit of the ALU



## ALU





The inputs on the first four muxes were all filled as from input 8-15 it receives the bits of the logical operation from the logical unit but we needed extra for muxes as the output from the multiplication needed 8 bits so we normalized all outputs to be 8-bits and we made logical extension to the logical output and sign extended the last 3 bits from the addition and subtraction output

## Alu Symbol



## D-Q flip flop

### schematic



We used mux based positive edge trigger(master-slave)

## Symbol



## D-Q flip flop-TB



When the input (D)=1 and the positive clock edge comes the (Q) turns High



When the input (D)=0 and the positive clock edge comes the (Q) turns low

## Final Alu with load capacitance and flipflop



# Alu operations

## 1-Increment A

Selection lines  $\rightarrow 0000$      $A \rightarrow 0110 = 6$



the output as shown  $\rightarrow 0000\ 0111=7$

## 2-Increment B

Selection lines  $\rightarrow 0001$      $B \rightarrow 1000 = -8$



the output as shown  $\rightarrow 1111\ 1001 = -7$

### 3-Transfer A

Selection lines → 0010    A → 0110



the output as shown → 0000 0110 = 6

## 4-Transfer B

Selection lines → 0011    B → 1111 = -1



the output as shown → 1111 1111 = -1

## 5-Decrement A

Selection lines  $\rightarrow 0100$      $A \rightarrow 1111 = -1$



the output as shown  $\rightarrow 1111\ 1110 = -2$

## 6-Multiply

Selection lines → 0101    A → 1110 = -2    B → 0101 = 5



the output as shown → 1111 0110 = -10

## 7-Add

Selection lines → 0110    A → 0010 = 2    B → 0001 = 1



the output as shown → 0000 0011 = 3

## 8-sub

Selection lines  $\rightarrow 0111$    A  $\rightarrow 0010 = 2$    B  $\rightarrow 0001 = 1$



the output as shown  $\rightarrow 0000\ 0001 = 1$

For all logical tests below inputs:  $A \rightarrow 1100, B \rightarrow 0110$

### Complement A (selection line 1000)



### WaveForm



## Complement B (selection line 1001)



## WaveForm



## AND (selection line 1010)



## WaveForm



## OR (selection line 1011)



## WaveForm



## XOR (selection line 1100)



## WaveForm



## XNOR (selection line 1101)



## WaveForm



## NAND (selection line 1110)



## WaveForm



## NOR (selection line 1111)



## WaveForm



## Delay calculations:

First, we calculate  $t_{cq}$  which is the delay from clk to output once from high to low and others from low to high and taking the worst.



$$T_{cq} = 48.51 \text{ ps}$$

Second, we calculate worst case propagation delay



$$T_p = 1.35647 \text{ ns}$$

Third, we calculate tsu:  
 (we take the worst case)



$$T_{SU} = 86.5882 \text{ Ps}$$

Then,

$$\begin{aligned} T_{clk} (\text{min}) &= T_{cq} + T_{logic} + T_{su} \\ &= 48.51 + 1356.47 + 86.5882 = 1491.568 \text{ ps} \\ &= 1.491 \text{ ns} \end{aligned}$$

To have an error tolerance or in other words a bit of extra slack to avoid errors, we rounded up the clock period to 2ns

$$F(\text{max}) = 1/T_{clk} = 500 \text{ MHz}$$

We then proceeded with the stated clock period in the following test-benches of the whole system that include:

- 1. Inputs Flip Flops**
- 2. Arithmetic Unit**
- 3. Logical Unit**
- 4. MUX Control Unit**
- 5. Output Flip Flops**

# Transient Analysis

Test Cases of the final ALU to show the correct sampling of the flipflops

**1-SUB:** All Inputs of each A and B are bundled to two different voltage input lines for simplicity in this case



**2-XOR:** Inputs A01 & B23 are in on a different input voltage line from inputs A23 & B01



For the next two Cases, each input has its voltage line

### 3-OR:



### 4-Multiply:



## 5-Increment A



## 6-Increment B



## 7-Transfer A



## 8-Decrement A



## 9-Transfer B



## 10-Add



## 11-A Complement



## 12-B Complement



## 13-AND



## 14-X-NOR



## 15-NAND



## 16-NOR



## Notes including the Loading 2 pF capacitor

We started our design from the beginning with sizing of nanometer and taking a reference inverter of  $W_p/W_n = 320n/160n$ . Therefore, using a 2 pF capacitor as a load would result in a huge loading effect that increases the delay enormously.

The increase in the delay would lead to such a worse clock frequency than the one we obtained. So, since it is not a requirement in the project description, we preferred not to add it, and if we were to add a loading capacitance, it would be of Femto Farad scale

The following figures would illustrated the delay due to intrinsic delay of the inverter and the delay after using the loading capacitor.

### Intrinsic Delay:



$$T_{phl} = 8.24 \text{ ps}$$

## Loading Delay:



T<sub>phl</sub> after loading = 18 ns

This means that the delay became larger by a factor of 2185, which is huge due to the difference between the gates input capacitances and the 2 pF

# Power at Highest Frequency

We adjusted the clock to **1.5 nano seconds** to obtain the power in two most probable worst dynamic power. The cases are of the **Multiply and ADD** operations.

## Multiply Power

To get worst case power we made the inputs transition from  $0 \rightarrow 1$  and back from  $1 \rightarrow 0$  and took average power of simulation



It is observed that the Average power from Simulation is **0.7546 mW**

## ADD Power

Like Multiply we made the inputs transition from  $0 \rightarrow 1$  and back from  $1 \rightarrow 0$  and took average power of simulation



The average Power of ADD Operation came out ass **0.7558 mW**

# Power of ALU

The Dynamic power of **Multiply** and **ADD** operation came out relatively the same so we could fairly assume that average worst-case power is equal to about **755 uW or 0.755 mW**



## Verilog Code and Test-Bench

Note: The clock here is of 1.5ns as it isn't affected by noise as much as in cadence (Just for the sake of Optimization)

The codes are test-benched and configured on a clock period of 1.5 nsec

### **alu.v**

```
module alu_registered(
    input clk,
    input signed [3:0] A,
    input signed [3:0] B,
    input [3:0] opcode,
    output reg signed [7:0] result
);

reg signed [3:0] A_reg, B_reg;
reg signed [7:0] alu_result;

// Register inputs A and B
always @(posedge clk) begin
    A_reg <= A;
    B_reg <= B;
end

// Combinational ALU logic using registered inputs and combinational opcode
always @(*) begin
    case(opcode)
        4'b0000: alu_result = A_reg + 1;
        4'b0001: alu_result = B_reg + 1;
        4'b0010: alu_result = A_reg;
        4'b0011: alu_result = B_reg;
        4'b0100: alu_result = A_reg - 1;
        4'b0101: alu_result = A_reg * B_reg;
        4'b0110: alu_result = A_reg + B_reg;
    endcase
end
```

```

4'b0111: alu_result = A_reg - B_reg;
4'b1000: alu_result = ~A_reg;
4'b1001: alu_result = ~B_reg;
4'b1010: alu_result = A_reg & B_reg;
4'b1011: alu_result = A_reg | B_reg;
4'b1100: alu_result = A_reg ^ B_reg;
4'b1101: alu_result = ~(A_reg ^ B_reg);
4'b1110: alu_result = ~(A_reg & B_reg);
4'b1111: alu_result = ~(A_reg | B_reg);
default: alu_result = 0;
endcase
end

// Register the ALU output result on clk (single flip-flop)
always @(posedge clk) begin
    result <= alu_result;
end

endmodule

```

---

## alu\_tb.v

`timescale 1ns / 1ps

// Description: Testbench for ALU with registered inputs/outputs

module alu\_tb;

```

reg clk;
reg signed [3:0] A_t;
reg signed [3:0] B_t;
reg [3:0] opcode_t;
wire signed [7:0] result_t;

```

```

// Instantiate DUT
alu uut (
    .clk(clk),
    .A(A_t),
    .B(B_t),
    .opcode(opcode_t),
    .result(result_t)
);

// Clock generation: 670 MHz (T = 1.5 ns)
initial clk = 0;
always #0.75 clk = ~clk; // Toggle every 0.75ns → 1.5ns period

initial begin
    $display("Time | A | B | opcode | result");
    $display("-----|-----|-----|-----");
    A_t = 0; B_t = 0; opcode_t = 0;
    #5; // Wait for initial setup

    for(opcode_t = 4'b0000; opcode_t <= 4'b1111; opcode_t = opcode_t + 1) begin
        A_t = 4'b1110;
        B_t = 4'b0110;
        #3; // Wait for clock
        $display("%0t | %b %b | %b | %b", $time, A_t, B_t, opcode_t, result_t);
    end

    $stop;
end

endmodule

```

# Vivado Waveforms

The followings are Waveforms of Flip Floped ALU:



| A    | B    |  | opcode |  | result   |
|------|------|--|--------|--|----------|
| 1110 | 0110 |  | 0000   |  | 11111111 |
| 1110 | 0110 |  | 0001   |  | 00000111 |
| 1110 | 0110 |  | 0010   |  | 11111110 |
| 1110 | 0110 |  | 0011   |  | 00000110 |
| 1110 | 0110 |  | 0100   |  | 11111101 |
| 1110 | 0110 |  | 0101   |  | 11110100 |
| 1110 | 0110 |  | 0110   |  | 00000100 |
| 1110 | 0110 |  | 0111   |  | 11111000 |
| 1110 | 0110 |  | 1000   |  | 00000001 |
| 1110 | 0110 |  | 1001   |  | 11111001 |
| 1110 | 0110 |  | 1010   |  | 00000110 |
| 1110 | 0110 |  | 1011   |  | 11111110 |
| 1110 | 0110 |  | 1100   |  | 11111000 |
| 1110 | 0110 |  | 1101   |  | 00000111 |
| 1110 | 0110 |  | 1110   |  | 11111001 |
| 1110 | 0110 |  | 1111   |  | 00000001 |

# Final Vivado Schematic

