

A

## MINI PROJECT REPORT

ON

### “DESIGN AND IMPLEMENTATION OF HIGH SPEED 32-BIT VEDIC MULTIPLIER USING VERILOG HDL”

*Submitted in partial fulfillment of the requirements  
for the award of the degree*

**BACHELOR OF TECHNOLOGY (B. TECH)**

IN

**ELECTRONICS AND COMMUNICATION ENGINEERING**

**Submitted by**

K. SAI KARTHIK REDDY                    22R91A04D7

N. SAIKIRAN                                22R91A04H6

M. PRAVALIKA                              22R91A04G1

M. CHANDRAMOHAN REDDY              22R91A04E8

*Under the guidance of  
Mr. M.V.V. SATYA CHOWDARY  
Assistant Professor*



**DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING  
TEEGALA KRISHNA REDDY ENGINEERING COLLEGE**

**Medbowli, Meerpeta, Saroor Nagar, Hyderabad -097**

**Affiliated to JNTUH, Hyderabad**

**May, 2025**



# Teegala Krishna Reddy Engineering College

(UGC - Autonomous)

(Sponsored by TKR Educational Society)  
(Approved by AICTE, New Delhi and Affiliated to JNTUH, Accredited by NBA & NAAC 'A' Grade)

Medbowli, Meerpeta, Balapur, Hyderabad - 500 097  
Mobile : 84980 85218, E-mail : info@tkrec.ac.in, website : www.tkrec.ac.in



College Code : R9

## CERTIFICATE

This is to certify that the project entitled "**DESIGN AND IMPLEMENTATION OF HIGH SPEED 32-BIT VEDIC MULTIPLIER USING VERILOG HDL**" submitted by **K. SAI KARTHIK REDDY (22R91A04D7), N. SAIKIRAN (22R91A04H6), M. PRAVALIKA (22R91A04G1), M. CHANDRAMOHAN REDDY (22R91A04E8)**, in partial fulfillment of the requirements for the award of **BACHELOR OF TECHNOLOGY** in **ELECTRONICS AND COMMUNICATION ENGINEERING** in **JNTUH**, is a record of Bonafide work carried out under supervision and guidance.

### PROJECT GUIDE

**Mr. M.V.V. SATYA CHOWDARY**

**ASSISTANT PROFESSOR**

### HEAD OF THE DEPARTMENT

**Dr. SK. UMAR FARUQ**  
**PROFESSOR**

### EXTERNAL EXAMINER

**Date:**

**Place:**



# Teegala Krishna Reddy Engineering College

(UGC - Autonomous)

(Sponsored by TKR Educational Society)

(Approved by AICTE, New Delhi and Affiliated to JNTUH, Accredited by NBA & NAAC 'A' Grade)

Medbowli, Meerpeta, Balapur, Hyderabad - 500 097

Mobile : 84980 85218, E-mail : info@tkrec.ac.in, website : www.tkrec.ac.in



College Code : R9

## DECLARATION

We, **K. SAI KARTHIK REDDY (22R91A04D7), N. SAIKIRAN (22R91A04H6), M. PRAVALIKA (22R91A04G1), M. CHANDRAMOHAN REDDY (22R91A04E8)**, hereby declare that the project work entitled "**DESIGN AND IMPLEMENTATION OF HIGH SPEED 32-BIT VEDIC MULTIPLIER USING VERILOG HDL**" submitted to the Department of **ELECTRONICS AND COMMUNICATION ENGINEERING**, [Teegala Krishna Reddy Engineering College] in partial fulfillment of the requirements for the award of the degree of **BACHELOR OF TECHNOLOGY** in **ELECTRONICS AND COMMUNICATION ENGINEERING** by **JNTUH**, is a record of original work carried out by us under the guidance of **Mr. M.V.V. SATYA CHOWDARY, Assistant Professor**, during the academic year 2024-2025.

I further declare that this project has not formed the basis for the award of any degree, diploma, associate-ship, fellowship, or any other similar title or recognition previously.

**Date:**

**Place:**

(Signature of the Student)

Name: K. SAI KARTHIK REDDY  
Reg. No.: 22R91A04D7

(Signature of the Student)

Name: N. SAIKIRAN  
Reg. No.: 22R91A04H6

(Signature of the Student)

Name: M. PRAVALIKA  
Reg. No.: 22R91A04G1

(Signature of the Student)

Name: M. CHANDRAMOHAN REDDY  
Reg. No.: 22R91A04E8



# Teegala Krishna Reddy Engineering College

(UGC - Autonomous)

(Sponsored by TKR Educational Society)

(Approved by AICTE, New Delhi and Affiliated to JNTUH, Accredited by NBA & NAAC 'A' Grade)

Medbowli, Meerpet, Balapur, Hyderabad - 500 097

Mobile : 84980 85218, E-mail : info@tkrec.ac.in, website : www.tkrec.ac.in



College Code : R9

## ACKNOWLEDGEMENT

The satisfaction and euphoria that accompanies the successful completion of any task would be incomplete without the mention of the people who made it possible and whose encouragement and guidance have crowned our efforts with success.

First, we would like to thank our Mini Project guide **Mr. M.V.V. SATYA CHOWDARY, Assistant Professor** in Department of **E.C.E.**, for her inspiration, adroit guidance and constructive criticism for completion of our degree.

We would like to convey our special thanks to the Mini Project Co-Ordinator **Mrs. G. VIJAYA, Assistant Professor** in Department of **E.C.E.**, for her valuable guidance and suggestions in analyzing and testing throughout the period, till the end of this work completion.

Also, we would like to express our sincere gratitude to **Dr. SK. UMAR FARUQ, Professor** Head of Department in Electronics & Communication Engineering during the progress of the Project work, for his timely suggestions and help in spite of his busy schedule.

We would like to convey our special gratitude to the Vice-Principal and Dean of Academics at Teegala Krishna Reddy Engineering College **Dr. B. SRINIVASA RAO, Professor** in Department of **C.S.E.**, for his inspiration, adroit guidance and constructive criticism for completion of this work.

Our acknowledgement extended to Principal **Dr. K.V. MURALI MOHAN, Principal** of **TEEGALA KRISHNA REDDY ENGINEERING COLLEGE** for his consistent help and encouragement to complete the research work.

We are very much thankful to **TKR EDUCATIONAL SOCIETY** and our beloved Chairman **Sri TEEGALA KRISHNA REDDY Garu** for their help in providing good facilities in our college.

K.SAI KARTHIK REDDY  
N.SAIKIRAN  
M.PRAVALIKA  
M.CHANDRAMOHAN REDDY

22R91A04D7  
22R91A04H6  
22R91A04G1  
22R91A04E8

## ABSTRACT

The increasing demand for high-speed digital arithmetic operations in modern computing systems necessitates the development of efficient multiplication techniques. This work presents the design of a high-speed 32-bit binary Vedic multiplier based on the principles of ancient Vedic mathematics, particularly the Vedic Sutras. Vedic multiplication, renowned for its parallel processing capabilities, offers significant advantages over conventional multiplication techniques. The proposed design optimizes the addition and carry management stages, which are crucial for reducing the overall delay in the multiplication process, making it suitable for modern digital systems requiring high-speed arithmetic operations. The architecture of the 32-bit Vedic multiplier is implemented using Verilog Hardware Description Language (HDL), ensuring flexibility and scalability. The multiplier was simulated and synthesized using AMD Vivado to validate its performance in a simulated environment. Simulation results show that the 32-bit Vedic multiplier significantly reduces time delay compared to conventional multiplier designs. Additionally, optimized resource utilization further enhances its performance. The design is scalable, allowing it to be extended to larger bit-widths for higher-order multiplications, which is particularly advantageous for applications in Digital Signal Processing (DSP), cryptography, and other computationally intensive tasks.

## TABLE OF CONTENTS

|                                                                   |           |
|-------------------------------------------------------------------|-----------|
| <b>LIST OF FIGURES.....</b>                                       | <b>i</b>  |
| <b>LIST OF ABBREVIATIONS.....</b>                                 | <b>ii</b> |
| <b>CHAPTER 1.....</b>                                             | <b>1</b>  |
| <b>INTRODUCTION.....</b>                                          | <b>1</b>  |
| 1.1 INTRODUCTION.....                                             | 1         |
| 1.2 IMPORTANCE OF HIGH-SPEED MULTIPLIERS.....                     | 2         |
| 1.3 OVERVIEW OF VEDIC MATHEMATICS.....                            | 3         |
| 1.4 OBJECTIVE OF THE PROJECT.....                                 | 5         |
| 1.5 URDHVA TIRYAKBHYAM SUTRA EXPLANATION.....                     | 6         |
| <b>CHAPTER 2 .....</b>                                            | <b>10</b> |
| <b>LITERATURE SURVEY.....</b>                                     | <b>10</b> |
| 2.1 LITERATURE INTRODUCTION.....                                  | 10        |
| 2.2 MAIN CONCERNS AND REASONS.....                                | 10        |
| 2.3 PREVIOUS WORKS AND RESEARCH.....                              | 13        |
| <b>CHAPTER 3.....</b>                                             | <b>15</b> |
| <b>PROPOSED SYSTEM.....</b>                                       | <b>15</b> |
| 3.1 MODIFIED 32-BIT VEDIC MULTIPLIER DESIGN.....                  | 15        |
| 3.1.1 DESIGN IMPROVEMENTS OVER CONVENTIONAL DESIGN.....           | 15        |
| 3.2 USES OF CARRY SAVE ADDER.....                                 | 17        |
| 3.2.1 WHAT IS A CARRY SAVE ADDER.....                             | 17        |
| 3.2.2 HOW CARRY SAVE ARE USED IN THE 32-BIT VEDIC MULTIPLIER..... | 17        |
| 3.2.3 BENEFITS OF USING CSA IN THIS DESIGN.....                   | 18        |
| 3.3 OPTIMIZATION TECHNIQUES.....                                  | 18        |
| 3.4 FUNCTIONAL BLOCK DIAGRAM.....                                 | 20        |
| 3.5 SCHEMATIC DIAGRAM.....                                        | 22        |
| <b>CHAPTER 4.....</b>                                             | <b>23</b> |
| <b>SOFTWARE DESCRIPTION.....</b>                                  | <b>23</b> |
| 4.1 AMD VIVADO INSTALLATION PROCESS IN SYSTEMS.....               | 23        |
| 4.2 VEDIC MULTIPLIER SIMULATION PROCESS IN AMD VIVADO.....        | 25        |
| <b>CHAPTER 5.....</b>                                             | <b>27</b> |
| <b>RESULT.....</b>                                                | <b>27</b> |
| 5.1 RESULT.....                                                   | 27        |
| 5.2 ADVANTAGES.....                                               | 28        |
| 5.3 APPLICATIONS.....                                             | 29        |

|                                           |           |
|-------------------------------------------|-----------|
| <b>CHAPTER 6.....</b>                     | <b>30</b> |
| <b>CONCLUSION &amp; FUTURE SCOPE.....</b> | <b>30</b> |
| 6.1 CONCLUSION.....                       | 30        |
| 6.2 FUTURE SCOPE.....                     | 32        |
| <b>REFERENCES.....</b>                    | <b>36</b> |
| <b>APPENDIX.....</b>                      | <b>37</b> |
| CODE.....                                 | 37        |

## **LIST OF FIGURES**

| <b>FIG NO</b> | <b>NAME OF THE FIGURE</b>                     | <b>PAGE NO</b> |
|---------------|-----------------------------------------------|----------------|
| 3.4           | Functional Block Diagram                      | 17             |
| 3.5           | Schematic Diagram                             | 19             |
| 4.1.1         | AMD Vivado Interface                          | 20             |
| 4.1.2         | AMD Vivado Editions                           | 21             |
| 4.2.1         | Vivado Setup Window                           | 22             |
| 4.2.2         | Simulation Interface                          | 23             |
| 5.1           | Output Simulation For $a=15, b=3$             | 24             |
| 5.2           | Output Simulation For $a=65535, b=4294901760$ | 25             |

## **LIST OF ABBREVIATIONS**

| <b>Abbreviation</b> | <b>Full Form</b>                        |
|---------------------|-----------------------------------------|
| CPU                 | Central Processing Unit                 |
| GUI                 | Graphical User Interface                |
| VLSI                | Very Large Scale Integration            |
| CMOS                | Complementary Metal-Oxide-Semiconductor |
| ASIC                | Application-Specific Integrated Circuit |
| FPGA                | Field Programmable Gate Array           |
| RTL                 | Register Transfer Level                 |
| CSA                 | Carry Save Adder                        |
| TB                  | Testbench                               |
| MSB                 | Most Significant Bit                    |
| LSB                 | Least Significant Bit                   |
| ALU                 | Arithmetic Logic Unit                   |
| EDA                 | Electronic Design Automation            |
| HDL                 | Hardware Description Language           |
| SOP                 | Sum of Products                         |
| MIPS                | Million Instructions Per Second         |
| CAD                 | Computer-Aided Design                   |
| MUX                 | Multiplexer                             |

# CHAPTER 1

## INTRODUCTION

### 1.1 INTRODUCTION

In modern digital systems, multiplication is one of the most crucial and frequently used arithmetic operations. From processors and signal processing units to embedded systems and artificial intelligence hardware, the efficiency of multiplication directly influences the system's overall performance, power consumption, and area utilization. As applications continue to demand high-speed, low-power, and area-efficient hardware, the need for advanced multiplier architectures has become more prominent than ever.

Traditional binary multipliers, such as array and Wallace tree multipliers, have served their purpose effectively in numerous applications. However, these designs often involve a trade-off between speed, complexity, and area. With the emergence of high-throughput computing systems and real-time applications, conventional multiplication methods face significant limitations in terms of speed and resource efficiency, especially when scaled to higher bit-widths like 32-bit and beyond.

To overcome these limitations, researchers have increasingly turned toward unconventional and more optimized mathematical methodologies, among which Vedic mathematics has gained considerable traction. Rooted in ancient Indian mathematics, Vedic sutras offer highly efficient methods for arithmetic computations. One such sutra, Urdhva Tiryakbhyam (meaning “Vertically and Crosswise”), forms the basis for the Vedic multiplier. This technique significantly reduces the number of partial products and intermediate steps, enabling faster computations and more compact hardware realization.

The motivation behind this project stems from the desire to implement a high-speed, area-optimized 32-bit multiplier using Vedic mathematical principles and modern digital design practices. Furthermore, the standard Vedic multiplier design is enhanced by introducing Carry Save Adders (CSAs) and hierarchical modular construction, aimed at reducing critical path delays and resource bottlenecks. The integration of CSAs ensures that intermediate sums are processed concurrently, thus improving the overall propagation delay compared to traditional Ripple Carry Adder-based designs.

By designing a Modified High-Speed 32-bit Vedic Multiplier in Verilog HDL and testing it through simulation and synthesis tools such as AMD Vivado, this project aims to validate the theoretical speed and area benefits in a practical FPGA design environment. This multiplier design can serve as a foundational block in applications such as digital signal processors (DSPs), embedded control systems, and low-power VLSI designs.

The combination of ancient arithmetic wisdom with contemporary hardware engineering not only showcases the timeless value of Vedic mathematics but also provides a competitive edge in the development of high-performance arithmetic units for modern computing architectures.

## 1.2 IMPORTANCE OF HIGH-SPEED MULTIPLIERS

High-speed multipliers are indispensable components in modern computational systems. As digital systems evolve to handle increasingly complex tasks — from multimedia processing to real-time artificial intelligence and embedded control — the demand for efficient and high-speed arithmetic units, particularly multipliers, has grown exponentially.

At the core of many digital systems, multiplication plays a pivotal role in algorithms for digital signal processing (DSP), image and video compression, cryptography, machine learning, and scientific computations. In many of these applications, the multiplication operation is a bottleneck, often being the most time-consuming arithmetic operation. Therefore, optimizing the speed of multipliers has a direct and profound impact on overall system throughput and responsiveness.

In processors and digital hardware, multiplication frequently occurs in tight loops or time-critical sections of code. For instance, in digital filters or transform computations (e.g., FFT, DCT), a single delay in a multiplication step can propagate throughout the entire system, degrading real-time performance. Consequently, high-speed multipliers contribute not only to faster execution times but also to lower power consumption, as operations can be completed more quickly, allowing circuits to enter low-power states sooner.

Furthermore, in VLSI system design, performance metrics are often dictated by the critical path delay, which is the longest time required for data to propagate through combinational logic. Multipliers often lie on this critical path, especially in data paths involving matrix computations, convolution operations, or polynomial evaluations. Designing multipliers with minimal propagation delay ensures that the overall clock speed of the chip can be increased, thereby

improving performance without needing architectural overhauls.

In recent years, parallelism and pipelining have been used to accelerate multiplication, but these techniques come at the cost of increased silicon area and power usage. A well-optimized multiplier — such as one based on Vedic mathematics — can reduce the number of required logic gates and interconnections while preserving or enhancing speed. This results in compact, scalable, and low-power implementations, which are particularly valuable in mobile, battery-powered, and embedded environments.

Moreover, in hardware accelerators used for AI/ML inference (such as those in GPUs, TPUs, and custom ASICs), millions of multiplications are performed per second. Here, the efficiency of multipliers directly affects computational throughput and energy efficiency, two of the most critical factors in data center and edge computing applications.

In summary, high-speed multipliers are crucial for:

- Enhancing system speed and throughput
- Reducing power consumption and heat dissipation
- Improving area efficiency in VLSI implementations
- Enabling real-time performance in embedded systems
- Supporting scalability for high-performance applications.

The Modified Vedic Multiplier explored in this project directly addresses these demands, offering a balance of speed, area efficiency, and design simplicity, and serving as a high impact building block for next-generation digital systems.

### 1.3 OVERVIEW OF VEDIC MATHEMATICS

Vedic Mathematics is an ancient system of Indian mathematics derived from the Vedas — specifically the Atharva Veda. It was systematized in the early 20th century by Bharati Krishna Tirthaji, a Sanskrit scholar and mathematician, who reconstructed this system from ancient Hindu texts. He compiled 16 sutras (formulas) and 13 sub-sutras (sub formulas) that collectively offer quick, logical, and efficient methods for arithmetic computation.

Unlike conventional arithmetic, which often relies on step-by-step memorization and algorithmic rules, Vedic Mathematics emphasizes pattern recognition, mental calculations, and

visual strategies. These methods are not only elegant and intuitive but also lend themselves naturally to implementation in digital logic, especially in scenarios where speed and resource efficiency are paramount. Among the various sutras, one stands out as especially powerful for multiplication: "Urdhva Tiryakbhyam", meaning Vertically and Crosswise.

This method enables a vertical and crosswise calculation of partial products, followed by their summation to get the final result. It allows simultaneous generation of all partial products, enabling parallel processing and thereby reducing computation time — a property highly desirable in hardware design.

### **Key Features of Vedic Mathematics:**

- **Speed and Simplicity:** Operations are completed in fewer steps than traditional algorithms, particularly multiplication, division, and squaring.
- **Versatility:** Applicable to both simple arithmetic and complex algebraic operations.
- **Mental Computation:** Most techniques can be executed mentally or with minimal written work, fostering deep mathematical insight.
- **Generalization:** Techniques are easily extended from basic to advanced computations, including algebraic identities, polynomials, and even calculus.

### **Relevance in Digital Design:**

What makes Vedic Mathematics particularly suitable for digital hardware is its systematic decomposition of problems into smaller, regular operations — a structure that maps directly to modular and hierarchical digital logic. Specifically, in the context of this project:

- The Urdhva Tiryakbhyam Sutra enables the design of multipliers that are faster and more area-efficient than traditional binary multipliers.
- Vedic methods avoid complex carry propagation across the entire width of the numbers, which reduces delay.
- The approach promotes parallelism, a key advantage in FPGA and ASIC implementations.

When combined with modern design strategies like Carry Save Adders (CSAs) and pipelined architecture, Vedic Mathematics opens up a path to constructing multipliers that are not only mathematically sound but also architecturally superior. In essence, Vedic Mathematics serves as a

bridge between ancient wisdom and modern engineering.

## 1.4 OBJECTIVE OF THE PROJECT

The primary objectives of this project are as follows:

- **Design and Implementation of a 32-bit Vedic Multiplier**

To design a high-speed 32-bit Vedic multiplier using Vedic mathematics, specifically the Urdhva Tiryakbhyam sutra. This approach aims to minimize the delay and increase the computational efficiency of the multiplier.

- **Optimization of the Multiplier's Performance**

To optimize the multiplier's performance by utilizing hierarchical module designs (2-bit, 4-bit, 8-bit, 16-bit, and 32-bit) to achieve faster computation times and lower resource usage in FPGA/ASIC implementations.

- **Integration of Efficient Arithmetic Components**

To integrate advanced arithmetic components, such as Carry Save Adders (CSAs), Half Adders, and Full Adders, in the hierarchical structure of the multiplier for effective carry management and fast addition operations.

- **Improved Speed through Modified Architecture**

To modify the traditional Vedic multiplier design by introducing speed-enhancing techniques like parallelism and optimized logic gates to further reduce latency and increase throughput in high-performance applications.

- **Verilog HDL Implementation and Simulation**

To implement the 32-bit Vedic multiplier in Verilog HDL and perform simulations to validate its functionality, ensuring correct behavior at different stages of the hierarchical design, from 2-bit to 32-bit multiplications.

- **Hardware Resource Utilization and Analysis**

To evaluate the hardware resource usage, such as logic elements, flip-flops, and

memory, and compare the proposed design with conventional multipliers in terms of area, delay, and power consumption.

- **Application in High-Speed Computing Systems**

To demonstrate the potential use of the high-speed 32-bit Vedic multiplier in high-performance computing systems, including those used in digital signal processing, cryptographic applications, and real-time image/video processing.

## 1.5 URDHVA TIRYAKBHYAM SUTRA EXPLANATION

### Urdhva Tiryakbhyam Sutra Explained

The Urdhva Tiryakbhyam Sutra, which translates to "Vertically and Crosswise," is one of the most fundamental and widely used Sutras in Vedic Mathematics. It is primarily applied in the multiplication of multi-digit numbers, simplifying the process by breaking it down into smaller, more manageable steps.

### Key Concept of Urdhva Tiryakbhyam

The main idea of this Sutra is to perform multiplication by combining two methods: vertical multiplication and crosswise multiplication. This technique involves multiplying digits from the numbers in a systematic way, generating intermediate results (partial products) that are then added together to get the final product. The beauty of this approach lies in its ability to reduce the complexity of traditional multiplication and make it more efficient, especially when dealing with larger numbers.

The Sutra is applicable to any number of digits and can be used for both simple and complex multiplications, such as multiplying two 2-digit numbers, 3-digit numbers, or even larger numbers. It eliminates the need for long multiplication steps and reduces the possibility of errors in manual calculations.

### Step-by-Step Explanation of Urdhva Tiryakbhyam Sutra

Let's break down the process of using the Urdhva Tiryakbhyam Sutra with an example of multiplying two 2-digit numbers, say  $23 \times 41$ .

### A. Setup

Write the two numbers to be multiplied:  $23 \times 41$

### B. Vertical Multiplication

The first step is to multiply the digits vertically (i.e., multiplying the digits that are aligned in the same column). This gives us the following:

- Multiply the rightmost digits (units):

$$3 \times 1 = 3$$

- Multiply the leftmost digits (tens):

$$2 \times 4 = 8$$

This gives us the two partial products: 8 (from  $2 \times 4$ ) and 3 (from  $3 \times 1$ ).

### C. Crosswise Multiplication

Now, move on to the crosswise multiplication. This involves multiplying the digits diagonally (crosswise) and adding the results:

- Multiply the leftmost digit of the first number with the rightmost digit of the second number:

$$2 \times 1 = 2$$

- Multiply the rightmost digit of the first number with the leftmost digit of the second number:

$$3 \times 4 = 12$$

- Now, add these two crosswise products:

$$2 + 12 = 14$$

## D. Add the Results

Now, we can add the partial products from the previous steps:

- From the vertical multiplication: 8 and 3.
- From the crosswise multiplication: 14.

We place these results in the appropriate columns, starting from the rightmost place:

- (from  $3 \times 1$ )
- 14 (from  $2 \times 4 + 3 \times 1$ )
- (from  $2 \times 4$ )

The final result is 943, which is the result of  $23 \times 41$ .

## Example with Larger Numbers

Let's apply the Urdhva Tiryakbhyam Sutra to a slightly larger example:  $123 \times 456$ .

### Step 1: Setup

Write the numbers to be multiplied:

$$123 \times 456$$

### Step 2: Vertical Multiplication

Multiply the rightmost digits (units):  $3 \times 6 = 18$  .

Multiply the second digits:  $2 \times 5 = 10$   $1 \times 4 = 4$

Multiply the leftmost digits:  $1 \times 6 = 6$

### Step 3: Crosswise Multiplication

Multiply diagonally (leftmost of one number with the rightmost of the other and so on):

- $1 \times 5 = 5$

- $2 \times 4 = 8$
- $3 \times 3 = 9$

Add the crosswise results together:  $5 + 8 + 9 = 22$

#### **Step 4: Add the Results**

Now, arrange all the results (after adding intermediate carries where applicable):

- \* 18 (from  $3 \times 6$ )
- \* 22 (from  $2 \times 5 + 3 \times 4$ )
- \* 6 (from  $1 \times 6$ )
- \* 10 (from  $2 \times 6$ )
- \* 4 (from  $1 \times 5$ )

The final result is 56088, which is the result of  $123 \times 456$ .

## CHAPTER 2

### LITERATURE SURVEY

#### 2.1 LITERATURE INTRODUCTION

In the evolving landscape of digital system design, the efficiency of multiplication operations plays a critical role in enhancing the performance of processors, especially in applications like signal processing, encryption, and multimedia. Traditional multiplier architectures, such as Array, Wallace Tree, and Booth, often face challenges related to speed, power consumption, and hardware complexity. To address these issues, researchers have increasingly turned to Vedic mathematics, an ancient system based on 16 sutras, which offers innovative and efficient multiplication techniques like Urdhva Tiryagbhyam and Nikhilam. These methods enable faster computation, better scalability, and improved area and power optimization, making them highly suitable for implementation in Verilog or VHDL and synthesis on FPGA platforms using tools such as Xilinx ISE and ModelSim. The research also emphasizes the importance of modular design for scalability, efficient adder structures for improved performance in cryptographic applications, and comparative studies to benchmark Vedic multipliers against conventional ones. While some included studies diverge into unrelated fields like MANETs, the core focus remains on optimizing arithmetic logic through modified high-speed multiplier designs grounded in Vedic principles.

#### 2.2 MAIN CONCERNS AND REASONS

##### 1. Need for High-Speed and Efficient Multiplication

- **Concern:** Modern digital systems, particularly ALUs and DSPs, demand extremely fast multiplication operations to keep up with real-time processing requirements
- **Reason:** Multiplication is a core operation in many computationally intensive tasks (e.g., image processing, encryption, signal processing), and traditional approaches (like array or Wallace tree multipliers) often result in increased propagation delay and hardware complexity. Vedic multipliers offer reduced delay through parallelism and fewer logic levels, which is why they are being explored extensively.

## 2. Scalability of Multiplier Architectures

- **Concern:** Designs must scale efficiently from smaller bit widths (4, 8, 16 bits) to higher ones (32-bit and beyond) without a significant rise in delay or hardware usage.
- **Reason:** As systems evolve, the need for handling larger data sizes grows. The modularity in Vedic algorithms (especially Urdhva Tiryagbhyam) allows for seamless scalability using repetitive structural blocks, ensuring performance does not degrade sharply with increased operand sizes.

## 3. Optimization of Area and Power Consumption

- **Concern:** Multiplier circuits must consume less silicon area and power, especially in mobile, embedded, and cryptographic applications.
- **Reason:** Low-power designs are vital for battery-operated devices and high-density FPGAs. Vedic multipliers are preferred as they reduce the number of partial products and adders needed, leading to more compact and power-efficient implementations.

## 4. Hardware Implementation Suitability (Verilog/VHDL)

- **Concern:** Practical implementation requires that these algorithms be efficiently coded in hardware description languages (HDLs) like Verilog or VHDL and synthesized on FPGAs.
- **Reason:** To deploy these designs in real-world applications, simulation and synthesis are necessary. Papers focus on using tools like Xilinx ISE and ModelSim to validate timing, area, and functional correctness.

## 5. Comparison with Existing Multiplier Architectures

- **Concern:** Vedic multipliers must be benchmarked against other popular architectures such as Booth, Wallace Tree, and Array multipliers to justify their benefits.
- **Reason:** A comparative analysis helps identify performance trade-offs. Vedic multipliers have been found to outperform in speed and resource utilization but might be less effective in certain fixed-width precision cases without further optimization.

## 6. Use of Modified Algorithms to Enhance Accuracy and Reduce Errors

- **Concern:** In designs like the Modified Booth multiplier, truncation errors can lead to inaccuracies in fixed-width arithmetic used in DSPs and multimedia applications.
- **Reason:** Precision is critical in applications like image/video processing. Modified algorithms that include error compensation or adjusted partial product matrices help strike a balance between performance and accuracy.

## 7. Inclusion of Coprocessors for Task Offloading

- **Concern:** Main processors are overburdened with computational tasks, especially in multicore environments.
- **Reason:** By offloading specific tasks such as multiplication to dedicated hardware multipliers or coprocessors, the system throughput increases and power consumption is reduced. This underlines the importance of fast, standalone multipliers.

## 8. Relevance of Adders in Overall Multiplier Design

- **Concern:** Adder design plays a crucial role in optimizing final multiplication performance, especially in Galois Field (GF) arithmetic.
- **Reason:** Efficient binary adders (e.g., carry-save adders for GF( $p$ ) and GF( $2^n$ )) are necessary to maintain speed and reduce complexity, particularly in cryptographic circuits where unified support for different fields is needed.

## 9. Peripheral Topic on MANETs (Irrelevant to Core Subject)

- **Concern:** The inclusion of a study on MANETs using AOMDV routing protocol is unrelated to Vedic multipliers or digital arithmetic.
- **Reason:** While energy efficiency is a common goal, this paper focuses on networking issues like routing, malicious node detection, and QoS metrics, which are outside the scope of arithmetic logic design. Its inclusion may dilute the focus of a study centered on Vedic multipliers unless justified under a broader digital systems umbrella.

## 2.3 PREVIOUS WORKS AND RELATED RESEARCH

[1] "Modified High Speed 32-bit Vedic Multiplier: Design and Implementation," The proposed research presents a modified binary Vedic multiplier based on ancient Vedic mathematics sutras, offering improvements over previously implemented designs. It demonstrates enhanced performance in terms of reduced time delay and better device utilization. The design is implemented in Verilog HDL, simulated using Model Sim, and synthesized with Xilinx tools. While simulations were performed for 4, 8, 16, and 32-bit multiplications, results are specifically shown for the 32-bit case. The technique is scalable for larger bit sizes and has been compared with existing Vedic multipliers to highlight its advantages. [2] "FPGA Implementation of 32-bit Vedic Multiplier using Verilog HDL," To handle increasingly complex tasks, modern processors use multiple cores, which increases processing load. This can be alleviated by assigning specific tasks, like signal processing, to coprocessors. Since the speed of the Arithmetic Logic Unit (ALU) largely depends on the efficiency of the multiplier, fast multipliers are essential. Vedic mathematics, based on 16 sutras, offers efficient multiplication techniques that reduce area, power consumption, and delay in processors. Two key Vedic algorithms—Urdhva Tiryagbhyam and Nikhilam—have proven more efficient than traditional methods like the Wallace tree multiplier. These multipliers were implemented using Verilog HDL and synthesized using Xilinx ISE. [4] "VHDL implementation of fast NxN multiplier based on Vedic mathematics." The novel digital multiplication technique that differs from conventional add-and-shift methods. It supports modular design, allowing smaller blocks to be combined for larger multipliers, which is beneficial for VHDL implementation using structural modelling. The proposed approach is a general technique for NxN bit multiplication, offering reduced computation time for obtaining results. [5]"A Comparative Study on Different Multipliers This section explores the importance and implementation of multiplication in digital systems, particularly its role in signal processing and ALUs. It reviews various multiplier types—such as Array, Wallace Tree, Booth, and Vedic multipliers—and evaluates them on speed, area, power, and complexity. The multiplication process is described in three main stages: generating partial products, reducing them using adders, and performing a final addition. Efficiency in multipliers is essential for optimizing computation speed, area, and power usage, making them a critical focus of ongoing research in digital design.[8] “Adder for unified GF(p) and GF(2n) Galois field Multipliers”, This

introduces an efficient binary adder that can perform carry-save addition in both GF( $p$ ) and GF( $2^n$ ) fields without needing an external control signal. This makes it ideal for cryptographic applications, particularly in unified Galois field multipliers, by reducing complexity and improving hardware efficiency.

[9]"High Speed Vedic Multiplier for Digital Signal Processors" The proposed method focuses on enhancing the speed of Digital Signal Processors (DSPs) by utilizing a Vedic multiplication algorithm based on the Urdhava Tiryakbhyam method, which translates to "vertically and crosswise." This method is derived from Vedic mathematics, a system based on 16 sutras, rediscovered in the early 20th century. Traditionally, these sutras were used for efficient decimal multiplication in ancient India. By applying the same principles to binary multiplication, the algorithm aims to reduce the time delay in digital hardware systems.[10] FPGA Implementation of Modified Booth Multiplier. The paper presents the design of a high-speed multiplier with an error compensation technique aimed at reducing truncation errors in fixed-width multipliers, commonly used in multimedia and digital signal processing systems. Fixed-width multipliers are preferred for their ability to maintain a consistent format with minimal accuracy loss. To address truncation errors, the paper proposes modifying the partial product matrix of Booth multiplication and introducing an error compensation function.[11] "Demand Based Energy Efficient Topology In Manets Using Aomdv Routing Protocol" The research focuses on evaluating energy efficiency in Mobile Ad-Hoc Networks (MANETs), which face challenges such as dynamic topology changes, limited energy, and lack of existing infrastructure support. A key issue in MANETs is the presence of malicious nodes that disrupt routing protocols. The study uses the AOMDV routing protocol and evaluates its performance through simulations with tools like NS2, NAM, and AWK. The Quality of Service (QoS) parameters analyzed include throughput, packet loss, jitter, and energy consumption. The simulation incorporates the impact of malicious nodes appearing at different time.

## CHAPTER 3

### PROPOSED SYSTEM

#### 3.1 Modified 32-bit Vedic Multiplier Design

##### 3.1.1 Design Improvements Over Conventional Design:

The Modified 32-bit Vedic Multiplier introduces significant architectural and performance improvements over traditional binary and even standard Vedic multiplication approaches. These enhancements are made to meet the demands of high speed arithmetic in modern digital systems such as embedded processors, DSPs, and real-time automotive control units. Below are the key improvements incorporated in the modified design:

###### 1. Hierarchical and Modular Architecture:

One of the most critical advancements is the strictly hierarchical design, where the 32 bit multiplier is constructed from smaller Vedic modules:

- 2-bit → 4-bit
- 4-bit → 8-bit
- 8-bit → 16-bit
- 16-bit → 32-bit

This recursive modular construction ensures that each module is thoroughly tested and optimized before being used in the larger design. It allows easier debugging, verification, and synthesis. Unlike conventional flat designs, this hierarchy maintains consistency, readability, and scalability.

###### 2. Use of Carry Save Adders (CSAs):

In conventional designs, carry propagation is a major bottleneck in addition stages. The modified design smartly integrates Carry Save Adders (CSAs) in intermediate stages to postpone carry propagation until the final stage. Advantages of CSAs in this architecture:

- Enables parallel computation of partial sums and carries.
- Minimizes delay caused by serial carry propagation.
- Allows faster accumulation of partial products in the final summation stage.

###### 3. Improved Partial Product Management:

Instead of relying on a rigid shift-and-add mechanism, the modified design dynamically manages partial products by:

- Carefully aligning them using predetermined logic.

- Efficiently feeding them into dedicated adder modules.

This strategy reduces the gate-level switching activity, which improves both speed and power consumption.

#### **4. Optimized Adders for Final Stage Summation:**

The final stage uses high-performance adders (like Ripple Carry or Kogge-Stone in more aggressive variants) depending on the critical path requirements.

#### **5. Parallelism and Pipeline Readiness:**

Although not pipelined in its basic form, the modified 32-bit design is pipeline-ready. Each hierarchical module can be pipelined independently if timing closure demands it. The clear module boundaries make it ideal for pipelining in ASIC/FPGA implementations, which can drastically improve the throughput.

#### **6. Reduced Logic Depth:**

By ensuring that each stage performs a limited and well-defined computation, the overall logic depth is minimized. This contributes to:

- Reduced propagation delay.
- Better performance at higher clock frequencies.
- Improved timing closure during synthesis.

#### **7. Hardware Resource Efficiency:**

The design is optimized for resource-constrained environments. Through reuse of modules and shared logic blocks:

- Gate count is minimized.
- Power consumption is reduced.
- Area footprint is lowered without compromising speed.

Compared to conventional designs that often replicate logic for partial products or use complex multiplier circuits like Booth or Wallace trees, this Vedic approach provides a superior performance-to-area ratio.

#### **8. Suitability for FPGA and ASIC Realization:**

The modified 32-bit Vedic multiplier is structured to be easily mapped onto FPGA LUTs and logic slices. On ASICs, its modular nature allows effective placement and routing with minimal congestion. It is not just functionally efficient but also implementation-aware—a major advantage for production-level system.

### 3.2 Uses of Carry Save Adder

The integration of Carry Save Adders (CSAs) in the Modified 32-bit Vedic Multiplier design plays a crucial role in enhancing both the speed and efficiency of arithmetic operations, especially during the accumulation of partial products. Traditional addition in binary multiplication involves sequential carry propagation, which significantly increases the delay in high-bit-width multipliers. CSAs effectively eliminate this bottleneck by deferring carry propagation to the final stage.

#### 3.2.1 What is a Carry Save Adder?

A Carry Save Adder is a type of adder used to sum three or more binary numbers simultaneously without immediately propagating the carry. Instead of producing a single binary sum, a CSA outputs two separate results:

- A sum vector (partial sum).
- A carry vector (carry bits without being propagated).

These two vectors can be added in a final stage using a regular adder (such as a Ripple Carry Adder) to obtain the final result. The major benefit is that intermediate carries do not have to ripple through the entire bit-width during intermediate stages, which greatly reduces computation time.

#### 3.2.2 How CSAs Are Used in the 32-bit Vedic Multiplier?

During Vedic multiplication, especially at higher bit-widths like 32-bit, multiple partial products are generated at various hierarchical levels (from 2-bit to 16-bit modules). These partial products need to be summed efficiently. Here's how CSAs fit into this process:

##### 1. Parallel Accumulation of Partial Products:

When three partial products from smaller modules (say, 8-bit or 16-bit outputs) need to be added, a CSA adds them simultaneously and produces two outputs: a partial sum and a partial carry.

##### 2. Delay Optimization:

Since CSAs do not propagate the carry during intermediate stages, they reduce the critical path delay significantly. Only the final addition, performed using a Ripple Carry Adder or other optimized adder, handles the full carry propagation, minimizing delay overhead.

##### 3. Bit-width Scalability:

CSAs are used recursively as the multiplier scales:

- At 8-bit, 16-bit, and 32-bit levels, each stage involves more partial products.
- CSAs help keep the depth of the adder tree manageable and the propagation delay.

#### 4. Final Addition:

After all CSAs have performed their tasks, the resulting sum and carry vectors from the last stage are passed into a final adder to produce the complete 32-bit product.

##### 3.2.3 Benefits of Using CSA in This Design:

###### i. High-Speed Operation:

The major advantage of CSAs is speed. They allow multiple additions to happen in constant time, avoiding the delay of serial carry propagation.

###### ii. Efficient for Hardware Realization:

In FPGAs or ASICs, CSAs map efficiently to logic blocks. Their fixed structure makes them suitable for parallel hardware implementation, saving area and power.

###### iii. Power Optimization:

By reducing switching activity in the carry propagation path, CSAs contribute to lower dynamic power consumption.

###### iv. Predictable Timing:

CSAs provide a predictable and consistent delay profile, which is beneficial when designing for timing closure in larger systems.

### 3.3 Optimization Techniques

The Modified 32-bit Vedic Multiplier incorporates a range of optimization techniques that significantly enhance its performance, area efficiency, and suitability for practical hardware implementation. These optimizations address critical concerns such as delay reduction, power consumption, and modular scalability, ensuring that the multiplier is capable of meeting the rigorous demands of modern digital systems.

Below are the key optimization strategies implemented in the design:

#### 1. Hierarchical Decomposition

The multiplier leverages hierarchical decomposition, breaking down a complex 32-bit operation into smaller, manageable units:

- 32-bit → 16-bit modules
- 16-bit → 8-bit modules
- 8-bit → 4-bit modules
- 4-bit → 2-bit modules

This approach reduces design complexity and allows for parallel processing of smaller multipliers. Each lower-level module is optimized and reused, making the entire system modular and scalable.

## **2. Use of Carry Save Adders (CSAs):**

One of the core optimizations is the integration of CSAs during the accumulation of partial products. Instead of waiting for carry propagation at each step, CSAs allow intermediate sums and carries to be processed in parallel, significantly reducing critical path delay. The final sum and carry vectors are merged only once at the end, drastically improving speed.

## **3. Balanced Partial Product Generation:**

The design ensures that partial products are evenly distributed and aligned, avoiding skewed additions that would otherwise lead to longer combinational paths. This balanced layout minimizes delay and avoids congested logic paths during synthesis.

## **4. Minimized Gate Count via Logic Reuse:**

To keep the area footprint low, the multiplier is designed with a focus on logic reuse. Smaller multipliers are instantiated multiple times rather than duplicating logic. Custom designed half adders, full adders, and CSAs are reused across the design, ensuring minimal redundant circuitry.

## **5. Pipelining Compatibility:**

Though not inherently pipelined, the modular nature of the design makes it pipelining friendly. The clear demarcation between hierarchical levels allows for the easy insertion of pipeline registers. This means the multiplier can be adapted for high-frequency applications simply by pipelining stages to balance delays.

## **6. Power-Efficient Switching Logic:**

The partial products and intermediate addition stages are organized to minimize signal switching activity, a major contributor to dynamic power consumption. The use of CSAs also helps by reducing unnecessary signal toggling due to carry propagation, thereby making the circuit more energy-efficient.

## **7. Optimized Final Adder Selection:**

The final summation stage is designed to use a suitable adder based on timing constraints:

- Ripple Carry Adder for area-efficient designs.
- Carry Look-Ahead Adder or Kogge-Stone Adder for high-speed applications.
- This flexibility ensures that the multiplier can be tuned to meet specific design goals.

## 8. Synthesis and Technology Mapping Awareness:

The logic and module boundaries are defined in a way that aligns well with FPGA logic blocks (LUTs) and ASIC standard cells. This ensures that when the design is synthesized:

- Placement and routing are efficient.
- Timing violations are minimized.
- Utilization of resources is optimal.

## 3.4 FUNCTIONAL BLOCK DIAGRAM



**Fig 3.4.: Functional Block Diagram**

The architecture of a 32-bit Vedic multiplier using the Urdhva Tiryakbhyam method of Vedic mathematics, which has been optimized for high-speed multiplication by breaking the 32-bit multiplication into smaller 16-bit multiplications and combining their results using Carry Save Adders (CSAs) and adders.

### Description of the Architecture :

This 32-bit multiplier is divided into four 16-bit Vedic multipliers and uses 32-bit Carry Save Adders (CSAs) and 16-bit adders for result combination  $S[63:0]$ .

**A. Input Division:**

Two 32-bit inputs A[31:0] and B[31:0] are divided into high and low 16-bit parts:

- A[31:16], A[15:0]
- B[31:16], B[15:0]

**B. 16-Bit Vedic Multiplications:**

Four 16-bit Vedic multipliers compute partial products:

- $A[15:0] \times B[15:0] \rightarrow$  contributes to S[15:0]
- $A[15:0] \times B[31:16]$
- $A[31:16] \times B[15:0]$
- $A[31:16] \times B[31:16]$

**C. Partial Product Accumulation with CSAs:**

The results of cross-multiplications ( $A[15:0] \times B[31:16]$  and  $A[31:16] \times B[15:0]$ ) are added using 32-bit Carry Save Adders (CSAs). These results are again combined with the shifted results from the lower multiplication using another CSA.

**D. Final Stage Combination:**

The output from the topmost Vedic multiplier ( $A[31:16] \times B[31:16]$ ) is added to the carry outputs using a 16-bit adder, and an OR gate helps propagate correct bits. The final output is split as:

- S[15:0] → From  $A[15:0] \times B[15:0]$
- S[47:16] → From the middle CSAs
- S[63:48] → From the top 16-bit adder

### 3.5 SCHEMATIC DIAGRAM



**Fig 3.5:** Schematic Diagram

The provided schematic represents a 32-bit Vedic multiplier implemented using a modular and hierarchical approach, designed to produce a 64-bit output. The circuit takes two 32-bit inputs, labeled A[31:0] and B[31:0], which are each divided into two 16-bit segments. The design employs four instances of a 16-bit Vedic multiplier module to compute the partial products: V0 handles the lower 16 bits of both inputs, V1 and V2 compute the cross-products between the higher and lower halves, and V3 multiplies the upper 16 bits of both inputs. These four partial results are then efficiently combined using two 32-bit carry-save adders (CSA1 and CSA2), a logical OR unit, a left shifter, and a 16-bit adder module.

The CSA modules play a critical role in adding the intermediate results without propagating carry immediately, which enhances speed. The output of V3 is shifted left by 32 bits using a left shift block to align it properly in the final 64-bit result. The carry and sum from CSA1, along with the shifted output of V3, are added in CSA2. Meanwhile, the result of V0 and a logical OR operation are summed using a 16-bit adder.

Ultimately, all partial products and intermediate results are aggregated to produce the final 64-bit multiplication result, S[63:0]. This design showcases a high-speed multiplication architecture using Vedic principles, optimized through parallelism and modular design.

## CHAPTER 4

### SOFTWARE DESCRIPTION

#### 4.1 AMD VIVADO INSTALLATION PROCESS IN SYSTEMS

To work with Xilinx FPGAs, the AMD Vivado Design Suite must be installed. Below are the steps to properly install Vivado on a Windows-based system:

##### 1. System Requirements

Before installing, ensure your system meets the following:

- Operating System: Windows 10 (64-bit) or Linux (64-bit)
- RAM: Minimum 8 GB (16 GB recommended)
- Disk Space: At least 50 GB free
- Internet Connection: Required for downloading files and licensing

##### 2. Download Vivado Installer

- Visit the official AMD/Xilinx website: <https://www.xilinx.com>
- Navigate to Support > Downloads
- Select the appropriate version of Vivado (e.g., Vivado 2023.2)
- Choose Full Installer for offline installation or Web Installer for online installation
- Create an account or log in to proceed with the download.



**Fig 4.1.1:** AMD Vivado Interface.

### 3. Run the Installer

- Locate the downloaded .exe or .bin file
- Right-click and select Run as administrator
- Follow the on-screen prompts

### 4. Select Installation Options

- Choose Vivado HDL Design Edition or System Edition (depending on project needs)
- You can select additional options like:
  - Vivado Lab Edition
  - Vitis for embedded software development,Cable drivers
- Select installation path (default is usually C:\Xilinx).



**Fig 4.1.2:** AMD Vivado Editions

### 5. Install Cable Drivers

- During installation, ensure you check the option to install USB and JTAG cable drivers
- These are necessary for communication between your FPGA board and your system

### 6. License Setup

- After installation, the Vivado License Manager opens
- Choose one of the following options:
  - Obtain a free WebPACK license (suitable for small FPGAs)

- Use a purchased license key
- Follow prompts to generate and activate the license

## 7. Environment Setup

- Add Vivado binary paths to the system's environment variables if needed (optional for command-line usage)
- Restart your computer to apply settings

## 9. Verify Installation

- Open Vivado
- Create a new project to test whether the environment is working correctly
- Connect your FPGA board and verify that it is detected under the Hardware Manager.

## 4.2 VEDIC MULTIPLIER SIMULATION PROCESS IN AMD VIVADO

Below are the steps to properly setup the vedic multiplier project in AMD Vivado:

### 1. Open Vivado and create a new project:

- Click on Create New Project.
- Enter a project name (e.g., vedic\_multiplier\_sim).
- Choose project location and click Next.
- Select RTL Project, enable Do not specify sources at this time, then Next.
- Choose your FPGA board or part (e.g., XC7A100T for Artix-7), then Finish.



**Fig 4.2.1:** Vivado Setup Window

## 2. Add VHDL/Verilog Source Files

- Go to Project Manager > Add Sources > Add or Create Design Sources.
- Add your Vedic Multiplier design file (e.g., vedic\_multiplier.v).
- Ensure the top module is defined.

## 3. Create Testbench

- Add Sources > Add or Create Simulation Sources.
- Create a testbench file (e.g., tb\_verdic\_multiplier.v).



**Fig 4.2.2:** Simulation Interface

## 4. Run Behavioral Simulation

- Go to Flow Navigator > Simulation > Run Simulation > Run Behavioral Simulation.
- Vivado launches the simulation and shows waveform results.
- Observe signal transitions and verify the output.

## 5. Synthesize and Implement (Optional)

If you want to see how the design performs on FPGA:

- Click Run Synthesis.
- Then click Run Implementation.
- Generate Bitstream if needed.

# CHAPTER 5

## RESULT

### 5.1 RESULT



**Fig 5.1.:** Output Simulation for a=15, b=5

#### 1. Signal Breakdown:

- a[31:0]: Input operand A (32-bit)
- b[31:0]: Input operand B (32-bit)
- prod[63:0]: Output product (64-bit, since 32-bit  $\times$  32-bit = 64-bit)

#### 2. Time Points:

At the time marker 33 ns, inputs and output are stable, and we can examine the values:

##### Values at 33 ns:

- a = 15
- b = 3
- prod = 45

##### Explanation:

- $15 \times 3 = 45$  — this confirms the multiplier is functioning correctly for this input.

#### Other Time Segments:

##### 1. 10 ns:

- a = 12345678
- b = 0
- prod = 0
- Correct: Anything multiplied by 0 yields 0.

##### 2. 20 ns:

- a = 0
- b = 98765432
- prod = 0
- Again, multiplication by 0 is valid.

##### 3. 40 ns:

- a = 4294967295 (Max 32-bit unsigned value =  $2^{32} - 1$ )
- b = 3
- prod = 12884901885
- Check:  $4294967295 \times 3 = 12884901885$

##### 4. 50 ns:

- a = 2779096485
- b = 1515870810
- prod = 4212751239785102580

- This also matches the expected result (from external multiplication).



**Fig 5.2:** Output simulation for  $a=65535$ ,  $b=4294901760$

### 5. 60 ns:

- $a = 65535$
- $b = 4294901760$
- $\text{prod} = 281466386841600$
- Again correct (this seems to be multiplying max 16-bit with a 32-bit high mask).

This simulation confirms that the 32-bit Vedic multiplier module is working correctly, producing accurate 64-bit results across a wide range of test cases including small numbers, zeros, and large unsigned integers. The appearance of negative values indicates the simulator may be interpreting the output as signed—which can be addressed by explicitly defining the product signal as unsigned if required.

## 5.2 ADVANTAGES

- Improved Speed:** The simulation results in AMD Vivado show that the 32-bit Vedic multiplier achieves lower delay compared to conventional multipliers. Specifically, it has a delay of **2.823 ns**, which is an improvement over older techniques .
- Reduced Area Utilization:** The Vedic multiplier uses fewer Look-Up Tables (LUTs), slices, and flip-flops, making it more efficient in terms of hardware resource usage. The report notes **327 LUTs, 193 slices, and 393 flip-flops**, indicating area efficiency .
- Low Power Consumption:** Due to its efficient architecture and reduced switching activity, the Vedic multiplier consumes less power. This is particularly beneficial in applications requiring energy efficiency.

4. **Scalability:** The architecture is highly modular, allowing easy extension to higher bit widths. The 32-bit design is built using smaller 2x2 Vedic multipliers, showing how scalable and reusable the design.
5. **Parallel Processing:** The Vedic multiplication technique allows for parallel execution of smaller multipliers, which leads to faster computation and efficient utilization of resources.
6. **Effective for FPGA Implementation:** The simulation using AMD Vivado validates the design's suitability for real-world FPGA platforms, ensuring its practicality in embedded and digital signal processing applications.

### 5.3 APPLICATIONS

1. **Digital Signal Processing (DSP):**
  - Used in algorithms for fast Fourier transforms (FFT), convolution, and filtering operations.
  - Enhances speed and area efficiency in DSP hardware.
2. **Arithmetic Logic Units (ALUs):**
  - Improves performance of arithmetic operations like multiplication in processors.
  - Particularly useful in ALUs of CPUs, microcontrollers, and embedded systems.
3. **Image Processing:**
  - Accelerates pixel-level operations such as transformations and filtering.
  - Important for real-time image and video processing applications.
4. **Cryptography:**
  - Supports fast modular multiplications essential in RSA and ECC algorithms.
  - Used in secure hardware accelerators.
5. **Artificial Intelligence and Machine Learning:**
  - Applied in hardware acceleration of neural networks and matrix multiplications.
6. **FPGA and ASIC Implementations:**
  - Efficiently implemented in FPGA platforms like AMD Vivado for low power and high-speed operations.
  - Optimized for VLSI designs where area and power are critical.
7. **Wireless Communication Systems:**
  - Plays a key role in modulation/demodulation, coding, and signal detection processes.

## CHAPTER 6

### CONCLUSION & FUTURE SCOPE

#### **6.1 CONCLUSION:**

The development and implementation of the Modified 32-bit Vedic Multiplier reflect a significant advancement in the field of digital arithmetic design. This project successfully combines the elegance of Vedic mathematics, specifically the Urdhva Tiryakbhyam Sutra, with contemporary design principles such as hierarchical architecture, modular construction, and carry-save addition to produce a multiplier that is both computationally efficient and practically deployable in real-time systems.

#### **Synthesis of Ancient Principles with Modern Technology:**

At the heart of this work is the innovative application of the Urdhva Tiryakbhyam Sutra—a centuries-old Vedic technique for multiplication—adapted for binary arithmetic. Unlike conventional multiplication algorithms that depend heavily on sequential addition and shifting operations, the Vedic approach allows for parallel generation and accumulation of partial products, leading to substantial reductions in delay. By mapping this technique to digital logic, the design inherits the benefits of parallelism, regular structure, and simplicity, making it ideal for VLSI implementation. The result is a high-speed multiplier that outperforms traditional designs, both in simulation and hardware synthesis.

#### **Modular and Hierarchical Architecture**

One of the cornerstones of this project is the adoption of a modular, hierarchical structure. The 32-bit multiplier is not built monolithically but rather constructed from proven, smaller components:

- 2-bit → 4-bit → 8-bit → 16-bit → 32-bit

Each of these stages has been meticulously designed, simulated, and verified individually, ensuring accuracy, scalability, and reliability. This structure not only simplifies the verification process but also enables reuse of modules in other designs, thus promoting design efficiency and maintainability.

**Performance-Oriented Enhancements:**

Key performance-driven enhancements distinguish this multiplier from standard implementations:

1. **Carry Save Adders (CSAs)** are used extensively to minimize delay during the addition of multiple partial products. Instead of propagating carry bits at each stage, the design saves them until the final addition stage—thereby improving speed and reducing logic depth.
2. **Balanced and optimized partial product generation and alignment** ensure that signal congestion and logic fanout are kept minimal, facilitating better performance during hardware mapping on both FPGAs and ASICs.
3. **Flexible final addition stage**, allowing for integration with various types of adders (Ripple Carry, Carry Look-Ahead, or Kogge-Stone) depending on the design goals—such as area minimization or delay reduction.

**Hardware and Implementation Readiness:**

The design has been crafted with hardware synthesis and implementation in mind. Key aspects include:

1. Efficient mapping to FPGA slices and LUTs
2. Low gate count through logic reuse
3. Predictable timing paths for easier synthesis and place-and-route in ASIC tools

These features make the design highly suitable for a wide range of real-world applications where speed, area, and power are critical constraints.

**Suitability for Modern Applications:**

This modified Vedic multiplier can be seamlessly integrated into:

1. DSP processors
2. Embedded computing systems
3. Cryptographic hardware
4. Automotive ECUs
5. High-speed image and signal processing pipelines

Its pipelining-friendly architecture also makes it adaptable for high-frequency systems, further

increasing its versatility and relevance in contemporary digital design.

### **Final Reflections:**

This project demonstrates the potential of blending ancient mathematical wisdom with modern hardware techniques to develop solutions that are not only theoretically sound but also industrially viable. The Modified 32-bit Vedic Multiplier stands as a symbol of innovation, delivering speed, efficiency, and structural elegance in a single package.

By focusing on performance, optimization, and modularity, this project paves the way for future exploration into Vedic arithmetic circuits. It also sets a strong foundation for further extensions, such as pipelined or parallelized architectures, low-power variants, or even integrating machine learning-based hardware multipliers.

In conclusion, this work is a testament to the enduring relevance of Vedic mathematics and its powerful synergy with digital design, embodying a future-ready, high-performance arithmetic architecture that pushes the boundaries of conventional computation.

This has enhanced the understanding of essential concepts such as database management, interface design, and user interaction. Challenges encountered during the development phase offered opportunities for problem-solving and innovation, reinforcing both technical proficiency and critical thinking skills. The iterative development model enabled continuous refinement, aligning the final product closely with user expectations and functional requirements.

This work lays the foundation for future improvements and potential extensions. There is scope for integrating more advanced technologies such as AI for predictive analytics, mobile compatibility for broader accessibility, and cloud deployment for better scalability. Ultimately, this project underscores the importance of practical implementation of theoretical knowledge and reflects a significant stride in the contributor's professional and academic journey.

## **6.2 FUTURE SCOPE:**

### **Future Scope of the Modified 32-bit Vedic Multiplier**

The Modified 32-bit Vedic Multiplier serves not only as a high-performance arithmetic unit for present-day digital systems but also as a launchpad for future innovations in the domain of digital design, VLSI systems, and computational hardware. As technology continues to evolve toward faster, more energy-efficient, and scalable systems, this multiplier can be extended and adapted in several transformative directions.

Below is an in-depth look at the potential future enhancements and applications of this project:

### **1. Pipelined Architecture for High-Frequency Systems:**

While the current design is modular and optimized for speed, it operates in a non-pipelined structure. Introducing pipelining would allow:

- High throughput, as multiple multiplication operations can be processed simultaneously across pipeline stages.
- Higher clock frequency operation, especially beneficial for DSPs, real-time systems, and processor ALUs.
- Deterministic timing, improving system synchronization and reliability.

This would involve breaking the critical path using pipeline registers and optimizing each stage for minimal setup and hold violations.

### **2. Low Power Variants for Embedded and IoT Devices**

With the rise of battery-powered embedded systems and IoT platforms, power consumption becomes a top design priority. Future enhancements could include:

- Clock gating and power gating techniques to disable inactive modules.
- Asynchronous designs to eliminate global clock dependencies.
- Dynamic voltage and frequency scaling (DVFS) integrated with the multiplier control logic.

A low-power version would be highly suitable for wearable devices, wireless sensors, and mobile SoCs.

### **3. Extension to Floating-Point Multiplication**

Currently, the design handles fixed-point (integer) multiplication. A logical and valuable extension is to develop:

- Single-precision and double-precision floating-point multipliers based on IEEE-754 format.
- Vedic-inspired methods for mantissa multiplication, paired with efficient exponent addition and normalization.

This would open up applications in scientific computing, machine learning accelerators, and 3D graphics engines, where floating-point arithmetic is dominant.

### **4. Integration into Application-Specific Instruction Set Processors (ASIPs)**

The modular nature of the Vedic multiplier makes it ideal for integration into custom instruction set processors, where specific workloads (e.g., signal processing, encryption, AI operations) require specialized hardware. Future developments could focus on:

- Designing a custom instruction for Vedic multiplication.
- Integrating the multiplier in RISC-V or ARM-based cores as a co-processor or arithmetic accelerator.

This could significantly improve performance in real-time applications like radar signal processing, audio enhancement, and neural network inference.

## 5. Hardware Implementation and ASIC Fabrication

The design is currently validated through simulation and logical synthesis. The next frontier is physical implementation on silicon, involving:

- Layout design and tape-out for fabrication.
- Post-silicon validation, including power, area, and performance analysis.
- Design for Testability (DFT) strategies, like scan chains and built-in self-test (BIST), to ensure real-time fault detection.

This step is crucial for transitioning from an academic prototype to a commercial IP core or product-ready hardware unit.

## 6. Reconfigurable Multiplier for Adaptive Systems

In dynamic systems where performance and energy needs vary, a reconfigurable multiplier can adjust its bit-width or precision in real-time:

- Use of partial products only as required, depending on operand size.
- Power-aware reconfiguration to deactivate unused logic.
- Bit-slicing techniques to support 8, 16, or 32-bit multiplication within the same architecture.

Such designs are particularly valuable in adaptive DSPs, AI edge devices, and smart robotics.

## 7. Integration with AI and Machine Learning Hardware

Matrix multiplication is central to many machine learning models. Extending the Vedic multiplier to array multipliers or tensor processing units (TPUs) can be a high-impact direction.

Enhancements may include:

- Parallelized Vedic MAC (Multiply-Accumulate) units.
- Hardware-friendly implementation of convolution operations using Vedic logic.
- Custom Vedic-based acceleration for deep learning inference engines.

This could lead to compact, energy-efficient AI chips for real-time vision, language processing, and autonomous systems.

## 8. Exploration of Vedic Division and Other Arithmetic Units

The success of Vedic multiplication can be leveraged to design:

- Vedic division circuits for high-speed reciprocal or division operations.
- Vedic square root and exponentiation units, using the same sutra-based principles.

This will contribute to a complete arithmetic suite built on Vedic mathematics, offering uniformity, speed, and architectural synergy.

## REFERENCES

- [1] S. Akhter, "VHDL implementation of fast NxN multiplier based on Vedic mathematics," in Proc. 18th European Conference on Circuit Theory and Design, 2007, pp. 472-475
- [2] S. Nagaraj, Dr.G.M. Sreerama Reddy and Dr.S. Aruna Mastani; A Comparative Study on Different Multipliers-SurveyJournal of Advanced Research in Dynamical and Control Systems14739-7522018Institute of Advanced Scientific Research.
- [3] M.Pushpa, S. Nagaraj, Design and Analysis of 8-bit Array, Carry Save Array, Braun, Wallace Tree and Vedic Multipliers, IEEE Sponsored International Conference On New Trends In Engineering & Technology( ICNTET 2018).
- [4] Nagaraj, S; Thyagarajan, K; Srihari, D; Gopi, K; Design and Analysis of Wallace Tree Multiplier for CMOS and CPL Logic2018 International Conference on Computation of Power, Energy, Information and Communication (ICCPEIC)006-0102018IEEE
- [5] Josmin Thomas ; R. Pushpangadan ; S Jinesh Comparative study of performance Vedic multiplier on the Basis of Adders used 2015 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE)
- [6] S. Nagaraj, Dr.G.M. Sreerama Reddy and Dr.S. Aruna Mastani; A Survey on Adiabatic LogicInternational Conference on Communications, Signal Processing and VLSI(IC2SV2019),Springer Conference ,National Institute of Technology, Warangal.
- [7] S. Nagaraj,K.Venkataramana Reddy and and P.Anil Kumar3i;Analysis of Vedic Multiplier for Conventional CMOS & Complementary Pass Transistor Logic(CPL) Logics SCOPUS Indexed Springer 8th International Conference on Innovations in Electronics and Communication Engineering, (ICIECE-2019).
- [8] Au L.S. and Burgess N. (2002), "A (4:2) adder for unified GF( $p$ ) and GF( $2^n$ ) Galois field Multipliers", Proceedings of 36th IEEE Asilomar Conference on Signals, Systems, and Computers, vol. 2, pp. 1619-1623.
- [9] Chittibabu A., Sola V.K. and Raj C.P. (2006), "ASIC Implementation of New Architecture for constant coefficient Dadda multiplier for High Speed DSP applications", Proceedings of the National Conference on Recent trends in Electrical, Electronics and Computer Engineering, JCECON, pp. 299– 304.

## APPENDIX

### CODE:

- **vedicmult32bit**

```
module vedicmult32bit(product,a,b);
    input [31:0]a,b;
    output [63:0]product;
    wire [31:0]p0,p1,p2,p3;
    vm16bit u1(p0,a[15:0],b[15:0]);
    vm16bit u2(p1,a[31:16],b[15:0]);
    vm16bit u3(p2,a[15:0],b[31:16]);
    vm16bit u4(p3,a[31:16],b[31:16]);
    ///////////////
    wire [32:0]sum1,sum2;
    wire c1,c2;
    csa32bit csa32bit1(sum1,c1,p1,p2,32'd0,1'b0);
    csa32bit csa32bit2(sum2,c2,sum1[31:0],{16'b0,p0[31:16]},{p3[15:0],16'b0},1'b0);
    wire orresult;
    assign orresult=sum1[32]|sum2[32];
    /////////////////////////declaring product terms;
    assign product[15:0]=p0[15:0];
    assign product[47:16]=sum2[31:0];
    wire c3,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19;
    ha add1(product[48],c3,p3[16],orresult);
    ha add2(product[49],c5,p3[17],c3);
    ha add3(product[50],c6,p3[18],c5);
    ha add4(product[51],c7,p3[19],c6);
    ha add5(product[52],c8,p3[20],c7);
    ha add6(product[53],c9,p3[21],c8);
    ha add7(product[54],c10,p3[22],c9);
    ha add8(product[55],c11,p3[23],c10);
```

```
ha add9(product[56],c12,p3[24],c11);
ha add10(product[57],c13,p3[25],c12);
ha add11(product[58],c14,p3[26],c13);
ha add12(product[59],c15,p3[27],c14);
ha add13(product[60],c16,p3[28],c15);
ha add14(product[61],c17,p3[29],c16);
ha add15(product[62],c18,p3[30],c17);
ha add16(product[63],c19,p3[31],c18);
endmodule
```

- **csa32bit**

```
module csa32bit(sum,cout,x,y,z,cin);
input [31:0]x,y,z;
input cin;
output [32:0]sum;
wire [31:0]sum1;
output cout;
wire [31:0]carry1,carry2;
assign sum1=x^y^z;
assign carry1=x&y |y&z |z&x;
fulladder fa1(sum[0],carry2[0],sum1[0],1'b0,(cin));
fulladder fa2(sum[1],carry2[1],sum1[1],carry1[0],carry2[0]);
fulladder fa3(sum[2],carry2[2],sum1[2],carry1[1],carry2[1]);
fulladder fa4(sum[3],carry2[3],sum1[3],carry1[2],carry2[2]);
fulladder fa5(sum[4],carry2[4],sum1[4],carry1[3],carry2[3]);
fulladder fa6(sum[5],carry2[5],sum1[5],carry1[4],carry2[4]);
fulladder fa7(sum[6],carry2[6],sum1[6],carry1[5],carry2[5]);
fulladder fa8(sum[7],carry2[7],sum1[7],carry1[6],carry2[6]);
fulladder fa9(sum[8],carry2[8],sum1[8],carry1[7],carry2[7]);
fulladder fa10(sum[9],carry2[9],sum1[9],carry1[8],carry2[8]);
fulladder fa11(sum[10],carry2[10],sum1[10],carry1[9],carry2[9]);
fulladder fa12(sum[11],carry2[11],sum1[11],carry1[10],carry2[10]);
```

```
fulladder fa13(sum[12],carry2[12],sum1[12],carry1[11],carry2[11]);
fulladder fa14(sum[13],carry2[13],sum1[13],carry1[12],carry2[12]);
fulladder fa15(sum[14],carry2[14],sum1[14],carry1[13],carry2[13]);
fulladder fa16(sum[15],carry2[15],sum1[15],carry1[14],carry2[14]);
fulladder fa17(sum[16],carry2[16],sum1[16],carry1[15],carry2[15]);
fulladder fa18(sum[17],carry2[17],sum1[17],carry1[16],carry2[16]);
fulladder fa19(sum[18],carry2[18],sum1[18],carry1[17],carry2[17]);
fulladder fa20(sum[19],carry2[19],sum1[19],carry1[18],carry2[18]);
fulladder fa21(sum[20],carry2[20],sum1[20],carry1[19],carry2[19]);
fulladder fa22(sum[21],carry2[21],sum1[21],carry1[20],carry2[20]);
fulladder fa23(sum[22],carry2[22],sum1[22],carry1[21],carry2[21]);
fulladder fa24(sum[23],carry2[23],sum1[23],carry1[22],carry2[22]);
fulladder fa25(sum[24],carry2[24],sum1[24],carry1[23],carry2[23]);
fulladder fa26(sum[25],carry2[25],sum1[25],carry1[24],carry2[24]);
fulladder fa27(sum[26],carry2[26],sum1[26],carry1[25],carry2[25]);
fulladder fa28(sum[27],carry2[27],sum1[27],carry1[26],carry2[26]);
fulladder fa29(sum[28],carry2[28],sum1[28],carry1[27],carry2[27]);
fulladder fa30(sum[29],carry2[29],sum1[29],carry1[28],carry2[28]);
fulladder fa31(sum[30],carry2[30],sum1[30],carry1[29],carry2[29]);
fulladder fa32(sum[31],carry2[31],sum1[31],carry1[30],carry2[30]);
fulladder fa33(sum[32],cout,1'b0,carry1[31],carry2[31]);
endmodule
```

- **vm16bit**

```
module vm16bit(product,a,b);
input [15:0]a,b;
output [31:0]product;
wire [15:0]p0,p1,p2,p3;
vm8bit m1(p0,a[7:0],b[7:0]);
vm8bit m2(p1,a[7:0],b[15:8]);
vm8bit m3(p2,a[15:8],b[7:0]);
vm8bit m4(p3,a[15:8],b[15:8]);
```

```
//giving inputs to carry save adders
wire [16:0]sum1,sum2;
wire c1,c2;
csa16bit csa1(sum1,c1,p1,p2,16'b0000000000000000,1'b0);
csa16bit csa2(sum2,c1,sum1[15:0],{8'b00000000,p0[15:8]},{p3[7:0],8'b00000000},1'b0);
wire orresult;
assign orresult=sum1[16]|sum2[16];
///assigning product values
assign product[7:0]=p0[7:0];
assign product[23:8]=sum2[15:0];
wire c3,c4,c5,c6,c7,c8,c9,c10;
ha add1(product[24],c3,p3[8],orresult);
ha add2(product[25],c4,p3[9],c3);
ha add3(product[26],c5,p3[10],c4);
ha add4(product[27],c6,p3[11],c5);
ha add5(product[28],c7,p3[12],c6);
ha add6(product[29],c8,p3[13],c7);
ha add7(product[30],c9,p3[14],c8);
ha add8(product[31],c10,p3[15],c9);
endmodule
```

- **csa16bit**

```
module csa16bit(sum,cout,x,y,z,cin);
input [15:0]x,y,z;
input cin;
output [16:0]sum;
wire [15:0]sum1;
output cout;
wire [15:0]carry1,carry2;
assign sum1=x^y^z;
assign carry1=x&y | y&z | z&x;
fulladder fa1(sum[0],carry2[0],sum1[0],1'b0,(cin));
```

```
fulladder fa2(sum[1],carry2[1],sum1[1],carry1[0],carry2[0]);
fulladder fa3(sum[2],carry2[2],sum1[2],carry1[1],carry2[1]);
fulladder fa4(sum[3],carry2[3],sum1[3],carry1[2],carry2[2]);
fulladder fa5(sum[4],carry2[4],sum1[4],carry1[3],carry2[3]);
fulladder fa6(sum[5],carry2[5],sum1[5],carry1[4],carry2[4]);
fulladder fa7(sum[6],carry2[6],sum1[6],carry1[5],carry2[5]);
fulladder fa8(sum[7],carry2[7],sum1[7],carry1[6],carry2[6]);
fulladder fa9(sum[8],carry2[8], sum1[8],carry1[7], carry2[7]);
fulladder fa10(sum[9],carry2[9],sum1[9],carry1[8],carry2[8]);
fulladder fa11(sum[10],carry2[10],sum1[10],carry1[9],carry2[9]);
fulladder fa12(sum[11],carry2[11],sum1[11],carry1[10],carry2[10]);
fulladder fa13(sum[12],carry2[12],sum1[12],carry1[11],carry2[11]);
fulladder fa14(sum[13],carry2[13],sum1[13],carry1[12],carry2[12]);
fulladder fa15(sum[14],carry2[14],sum1[14],carry1[13],carry2[13]);
fulladder fa16(sum[15],carry2[15],sum1[15],carry1[14],carry2[14]);
fulladder fa17(sum[16],cout,1'b0,carry1[15],carry2[15]);
endmodule
```

- **vm8bit**

```
module vm8bit(product,a,b);
input [7:0]a,b;
output [15:0]product;
wire [7:0]p0,p1,p2,p3;
vm4bit m1(p0,a[3:0],b[3:0]);
vm4bit m2(p1,a[3:0],b[7:4]);
vm4bit m3(p2,a[7:4],b[3:0]);
vm4bit m4(p3,a[7:4],b[7:4]);
wire [8:0]sum1,sum2;
wire c1,c2;
carrysaveadder8bit cs1(sum1,c1,p1,p2,8'b00000000,1'b0);
carrysaveadder8bit cs2(sum2,c2,sum1[7:0],{4'b0000,p0[7:4]}, {p3[3:0],4'b0000},1'b0);
wire orresult;
```

```
assign orresult=sum1[8]|sum2[8];
assign product[3:0]=p0[3:0];
assign product[11:4]=sum2[7:0];
wire c3,c4,c5,c6;
ha add1(product[12],c3,p3[4],orresult);
ha add2(product[13],c4,p3[5],c3);
ha add3(product[14],c5,p3[6],c4);
ha add4(product[15],c6,p3[7],c5);
endmodule
```

- **carrysaveadder8bit**

```
module carrysaveadder8bit(sum,cout,x,y,z,cin);
input [7:0]x,y,z;
input cin;
output [8:0]sum;
wire [7:0]sum1;
output cout;
wire [7:0]carry1,carry2;
assign sum1=x^y^z;
assign carry1=x&y |y&z |z&x;
fulladder fa1(sum[0],carry2[0],sum1[0],1'b0,(cin));
fulladder fa2(sum[1],carry2[1],sum1[1],carry1[0],carry2[0]);
fulladder fa3(sum[2],carry2[2],sum1[2],carry1[1],carry2[1]);
fulladder fa4(sum[3],carry2[3],sum1[3],carry1[2],carry2[2]);
fulladder fa5(sum[4],carry2[4],sum1[4],carry1[3],carry2[3]);
fulladder fa6(sum[5],carry2[5],sum1[5],carry1[4],carry2[4]);
fulladder fa7(sum[6],carry2[6],sum1[6],carry1[5],carry2[5]);
fulladder fa8(sum[7],carry2[7],sum1[7],carry1[6],carry2[6]);
fulladder fa9(sum[8],cout,1'b0,carry1[7],carry2[7]);
endmodule
```

- **vm4bit**

```
module vm4bit(product,a,b);
input [3:0] a,b;
output [7:0]product;
wire [3:0]p0,p1,p2,p3;
wire [4:0]sum1,sum2;
wire c1,c2,orresult;
vedicmultiplier2bit m1(p0,a[1:0],b[1:0]);
vedicmultiplier2bit m2(p1,a[1:0],b[3:2]);
vedicmultiplier2bit m3(p2,b[1:0],a[3:2]);
vedicmultiplier2bit m4(p3,a[3:2],b[3:2]);
//partially product terms are completed according to block diagram ie;4 two bit multiplier
outputs are obtained
//adding cross products and 4'b0000 with carry save adder according to diagram
carrysaveadder cs1(sum1,c1,p1,p2,4'b0000,1'b0);
//csa2 is used to compute further sum of obtained sum1,msb 2bits of p0,lsb two bits of p3
carrysaveadder cs2(sum2,c2,sum1[3:0],{2'b00,p0[3:2]},{p3[1:0],2'b00},1'b0);
//doing or operation for carry and carry2 obtained from carry sav eadders
or(orresult,sum1[4],sum2[4]);//here we used carry bits as msb of sum1 and sum2 because
the csa sum is max of 5bits but we used only 4bits so we use the left over bits for or result
//assign product values upto 5
assign product[1:0]=p0[1:0];
assign product[5:2]=sum2[3:0];
//adding or result with 2msb bits of the p3 and obtainig result of product terms 6 and 7 as
follows
xor(product[6],orresult,p3[2]);
wire c3;
and(c3,orresult,p3[2]);
xor(product[7],c3,p3[3]);
endmodule
```

- **carrysaveadder**

```
module carrysaveadder(sum,cout,x,y,z,cin);
    input [3:0] x,y,z;
    input cin;
    output [4:0] sum;
    output cout;
    wire [3:0]sum1;
    wire [3:0]carry;
    wire [3:0]c1;
    assign sum1= x ^ y ^ z;
    assign carry=(x&y) | (y&z) | (z&x);
    adder fa1(sum[0],c1[0],sum1[0],cin,1'b0);
    adder fa2(sum[1],c1[1],sum1[1],carry[0],c1[0]);
    adder fa3(sum[2],c1[2],sum1[2],carry[1],c1[1]);
    adder fa4(sum[3],c1[3],sum1[3],carry[2],c1[2]);
    adder fa5(sum[4],cout,1'b0,carry[3],c1[3]);
endmodule
```

- **adder**

```
module adder(s,c,x,y,z);
    input x,y,z;
    output s,c;
    assign s=x^y^z;
    assign c=(x&y)|(y&z)|(z&x);
endmodule
```

- **fulladder**

```
module fulladder(s,c,x,y,z);
    input x,y,z;
    output s,c;
```

```
assign s=x^y^z;
assign c=(x&y)|(y&z)|(z&x);
endmodule
```

- **ha**

```
module ha(s,c,a,b);
input a,b;
output s,c;
assign s=a^b;
assign c=a&b;
endmodule
```

- **vedicmultiplier2bit**

```
module vedicmultiplier2bit(p,a,b);
input [1:0] a,b;
output [3:0]p;
wire carry1,mul1,mul2,mul3;
assign p[0]=a[0]&b[0];
assign mul1=a[1]&b[1];
assign mul2=a[1]&b[0];
assign mul3=a[0]&b[1];
halfadder ha1(p[1],carry1,mul2,mul3);
halfadder ha2(p[2],p[3],mul1,carry1);
endmodule
```

- **halfadder**

```
module halfadder(sum,carry,a,b);
input a,b;
output sum,carry;
assign sum=a^b;
assign carry=a&b;
```

```
endmodule
```

- **vedicmult32bit\_tb**

```
`timescale 1ns / 1ps
```

```
module vedicmult32bit_tb;
```

```
    // Inputs
```

```
    reg [31:0] a;
```

```
    reg [31:0] b;
```

```
    // Output
```

```
    wire [63:0] product;
```

```
    // Instantiate the Unit Under Test (UUT)
```

```
    vedicmult32bit uut (
```

```
        .product(product),
```

```
        .a(a),
```

```
        .b(b)
```

```
    );
```

```
    // Procedure to apply stimulus
```

```
    initial begin
```

```
        $display("Starting testbench for vedicmult32bit...");
```

```
        // Monitor changes
```

```
        $monitor("Time = %0t | a = %h | b = %h | product = %h", $time, a, b, product);
```

```
        // Test Case 1: Zero multiplication
```

```
        a = 32'd0; b = 32'd0;
```

```
        #10;
```

```
        // Test Case 2: One operand zero
```

```
        a = 32'd12345678; b = 32'd0;
```

```
        #10;
```

```
        a = 32'd0; b = 32'd98765432;
```

```
        #10;
```

```
        // Test Case 3: Small values
```

```
a = 32'd15; b = 32'd3;  
#10;  
  
// Test Case 4: Max values  
a = 32'hFFFFFFF; b = 32'hFFFFFFF;  
#10;  
  
// Test Case 5: Random large numbers  
a = 32'hA5A5A5A5; b = 32'h5A5A5A5A;  
#10;  
  
// Test Case 6: Another pattern  
a = 32'h0000FFFF; b = 32'hFFF0000;  
#10;  
  
// Done  
$display("Testbench completed.");  
$finish;  
end  
endmodule
```