



**Name: Kirllos Atef Shawky Elia**

**Project: 64-bit Parallel Prefix Adder – Brent – Kung**

**Method**

**GitHub:** [https://github.com/kirlloselia/ppa\\_brentkung](https://github.com/kirlloselia/ppa_brentkung)

## Table of Contents

|                                                          |    |
|----------------------------------------------------------|----|
| 1. Specs: .....                                          | 3  |
| 2. G,P generation :.....                                 | 5  |
| a) RTL schematic :.....                                  | 5  |
| b) RTL code :.....                                       | 5  |
| 3. SUM .....                                             | 6  |
| a)    RTL schematic :.....                               | 6  |
| b) RTL code :.....                                       | 6  |
| 4. Grey Operator .....                                   | 7  |
| a)    RTL schematic : .....                              | 7  |
| b)    RTL code : .....                                   | 7  |
| 5. Black Operator.....                                   | 8  |
| a)    RTL schematic :.....                               | 8  |
| b)    RTL code : .....                                   | 8  |
| 6. Carry Network .....                                   | 9  |
| a)    RTL schematic :.....                               | 9  |
| b)    RTL code : .....                                   | 10 |
| 7. 64 adder using Parallel Prefix Adder Brent Kung ..... | 11 |
| a)    RTL schematic :.....                               | 11 |
| b) RTL code : .....                                      | 11 |
| 8. testbench code for 64 adder .....                     | 12 |
| a) code.....                                             | 12 |
| b) Waveform.....                                         | 13 |
| c) Test-Bench log.....                                   | 13 |
| 12. IMPLEMENTATION.....                                  | 14 |
| a) Implementation schematic .....                        | 14 |
| 13. Utilization .....                                    | 15 |
| 14. Timing Reports .....                                 | 15 |

## 1. Specs:

You are asked to design a 64-bit parallel prefix adder using Brent-Kung Carry network through the following steps below and attach all the required captures and codes

Fig.3 below show an example how to implement a size-8 prefix sum network using Brent-Kung

Fig.4 show the steps how the example in fig.3 is build step by step first layer one which contains (8-1 adders) and change the problem from 8-input problem to a smaller one with only 4-inputs then we apply the same idea by using (4-1 adders) and split the problem to half please spend some time to convince yourself and make sure you understand the pattern very well then start to map this into a carry network of 64-bit adder by removing each + sign with a carry operator you will find the implementation of carry operator below in fig.4



Figure 3



Figure 4

- Inputs  $\{A, B, Cin\}$  ( $A, B$  are 64-bits inputs and  $Cin$  is 1-bit)
- Outputs  $\{Sum, Cout\}$  ( $sum$  is 64-bits output and  $Cout$  is 1-bit)

## 2. G,P generation :

### a) RTL schematic :



### b) RTL code :

```
1 module gp_logic #(
2     parameter N = 64
3 )(
4     input [N-1:0] x, y,
5     output [N-1:0] g, p
6 );
7     assign g = x&y;
8     assign p = x^y;
9 endmodule
```

### 3. SUM

a) RTL schematic :



b) RTL code :

```
1 module sum_logic#(
2     parameter N = 64
3 )((
4     input [N-1:0] c, p,
5     output [N-1:0] s
6 );
7     assign s = c ^ p;
8 endmodule
```

## 4. Grey Operator

a) RTL schematic :



b) RTL code :

```
1 module grey_carry_operator (
2     input p,                      // p''
3     input [1:0] g,                // g'', g'
4     output G
5 );
6
7     assign G = g[1] | (p & g[0]);
8
9 endmodule
```

## 5. Black Operator

a) RTL schematic :



b) RTL code :

```
1 // black cell: pg outputs
2 module black_carry_operator (
3     input [1:0] p, g,
4     // g'', p'', g', p'
4     output P, G
5 );
6
7     assign P = &p;
8     assign G = g[1] | (p[1] & g[0]);
9
10 endmodule
11
```

## 6. Carry Network

a) RTL schematic :



## b) RTL code :

```

1 // carry network
2 module bk_prefix_tree #(
3   parameter N = 16
4 )(
5   input [N-1:0] p,g,
6   input Cin,
7   output [N-1:0] C      // C_-1 = Cin, C_N-1 = Cout
8 );
9 localparam red_stages_no = $clog2(N);
10 localparam exp_stages_no = $clog2(N)-1;
11 localparam total_stages_no = red_stages_no + exp_stages_no;
// total number of stages: 2log2(N)-1
12 localparam NDIV2 = N >> 1;
13
14 // assign C[-1] = Cin;
15
16 wire [N-2:0] g_inter_red, p_inter_red;    // note that all intermediate g signals numbers
17
// in reduction levels converges to N-1 (N/2+N/4+..+1 = N-1)
18 // handle Cin
19 wire g0;
20 grey_carry_operator op_minus1 (.g({g[0], Cin}), .p(p[0]), .G(g0));
21 assign C[0] = g0;           //
22
23 // first level, takes input from module's input
24 bk_reduction_stage #(.N(N)) reduction_stage1 (.p(p[N-1:1]), .G({g[N-1:1], g0}), .P(p_inter_red[NDIV2-1:0]));
25
26 assign C[1] = g_inter_red[0];    // after level 1, x1 is ready
27
28 genvar i;
29 genvar j;
30 generate
31   for (i = 2; i <= red_stages_no; i = i + 1) begin: reduction_stage
32     localparam in_idx = (1<<red_stages_no)-(1<<(red_stages_no-1+2));
// input index to current level = 2^levels-2^(levels-i+2)
33     localparam in_size = N>>(i-1);
// input size = (N>>i)
34     localparam out_idx = in_idx+in_size;
// input index of current level = input index to current level + input size
35     localparam out_size = in_size>>1;
// output size = input size/2
36     wire [in_size-1:0] g_inter_red_in = g_inter_red[in_size+in_idx-1:in_idx];
37     wire [in_size-1:1] p_inter_red_in = p_inter_red[in_size+in_idx-1:in_idx+1];
38     if (i == red_stages_no) begin
// bk_reduction_stage #(.N(in_size)) reduction_stage
39       bk_reduction_stage #(.N(in_size)) reduction_stage
(.p(p_inter_red_in), .g(g_inter_red_in),
.out_idx));
40     end
41     else begin
42       bk_reduction_stage #(.N(in_size)) reduction_stage
(.p(p_inter_red_in), .g(g_inter_red_in),
.P(p_inter_red[out_size+out_idx-1:out_idx]));
43     end
44   end
45   localparam c_index = (1<<i) - 1; // (2^level) - 1, level = i
46   assign C[c_index] = g_inter_red[out_idx];
47 end
48
49 endgenerate
50 generate
51   for (i = 1; i <= exp_stages_no; i = i + 1) begin: expansion_stage
52     // localparam stage_size = (1<<i) - 1;           // stage size = (2^level) - 1
53     // localparam in_size = stage_size << i;          // input size = stage size * 2
54     localparam step = N>>(i+1);
55     localparam c_index_start = 3*step-1;
56     localparam c_index_end = N-step-1;
57     if (i == exp_stages_no) begin
58       for (j = c_index_start; j <= c_index_end; j = j + (step<<1)) begin
: expansion_stage_op
59         localparam c_base_idx = j - step;
60         grey_carry_operator expansion_stage_op
(.p(p[j]), .g({g[j], C[c_base_idx]}), .G(C[j]));
61       end
62     end
63     else begin
64       localparam i_red_stage = $clog2(step);
65       localparam start_idx_red = (1<<red_stages_no)-(1<<(red_stages_no-i_red_stage+1));
66       for (j = c_index_start; j <= c_index_end; j = j + (step<<1)) begin
: expansion_stage_op
67         localparam c_base_idx = j - step;
68         localparam red_group = j>>(i_red_stage);
69         localparam red_idx = start_idx_red + red_group;
70         grey_carry_operator expansion_stage_op
(.p(p_inter_red[red_idx]), .g({g_inter_red[red_idx], C[c_base_idx]}), .G(C[j]));
71       end
72     end
73   end
74 endgenerate
75
76 endmodule
77

```

## 7. 64 adder using Parallel Prefix Adder Brent Kung

a) RTL schematic :



b) RTL code :

```
1 module ppa_brentkung #(
2     parameter N = 64
3 )( 
4     input [N-1:0] x, y,
5     input Cin,
6     output [N-1:0] s,
7     output Cout
8 );
9     wire [N-1:0] g, p, c;
10    gp_logic #(N(N)) gp_logic_inst (.x(x), .y(y), .g(g), .p(p));
11    sum_logic #(N(N)) sum_logic_inst (.c(c), .p(p), .s(s));
12    wire [N-1:0] c_network;
13    bk_prefix_tree #(N(N)) bk_prefix_tree_inst (.g(g), .p(p), .cin(Cin), .C(c_network));
14    assign Cout = c_network -1;
15    assign c = {c[N-2:0], Cin}; // Concatenate, not slice assignment
16 endmodule
```

## 8. testbench code for 64 adder

### a) code

```
1 module ppa_brentkung_tb;
2
3 // -----
4 // Parameters
5 // -----
6 parameter N = 64; // Parameterized Width
7 localparam NUM_RANDOM_TESTS = 10000;
8
9 // -----
10 // Inputs & Outputs
11 // -----
12 reg [N-1:0] A;
13 reg [N-1:0] B;
14 reg Cin;
15
16 wire [N-1:0] Sum;
17 wire Cout;
18
19 // -----
20 // Variables for Verification
21 // -----
22 reg [N:0] expected_result; // N-bit to hold Sum + Carry
23 integer i, j;
24 integer error_count = 0;
25
26 // -----
27 // Instantiate the Unit Under Test (UUT)
28 // -----
29 ppa_brentkung #(N) uut (
30   .x(A),
31   .y(B),
32   .Cin(Cin),
33   .s(Sum),
34   .Cout(Cout)
35 );
36
37 // -----
38 // Test Procedure
39 // -----
40 initial begin
41   // Initialize
42   A = 0; B = 0; Cin = 0;
43
44   $display("=====");
45   $display(" Starting Self-Checking Testbench for          , N");
46   $display("=====");
47
48   // ... 1. Directed Corner Cases ...
49   $display("... Running Directed Corner Cases ...");
50
51   check_case({N[1'b0]}, {N[1'b0]}, 1'b0); // Zero check
52   check_case({N[1'b0]}, {N[1'b0]}, 1'b1); // Carry In check
53   check_case({N[1'b1]}, {N[1'b0]}, 1'b0); // Max Value check (No Cout)
54   check_case({N[1'b1]}, {N[1'b0]}, 1'b1); // Max Value check (With Cout)
55
56   // Alternating Bits logic
57   check_case({(N/2){2'b0}}, {(N/2){2'b0}}, 1'b0);
58
59   // ... 2. Random Verification Loop ...
60   $display("\n-- Running           , NUM_RANDOM_TESTS");
61   for (i = 0; i < NUM_RANDOM_TESTS; i = i + 1) begin: random_test
62     reg [N-1:0] rand_A, rand_B;
63
64     // Robust random generation for widths > 32-bits
65     for (j = 0; j < N; j = j + 32) begin
66       rand_A[j+:32] = $random;
67       rand_B[j+:32] = $random;
68     end
69
70     check_case(rand_A, rand_B, $random % 2);
71   end
72
73   // ... 3. Final Summary ...
74   $display("=====");
75   if (error_count == 0) begin
76     $display(" SUCCESS: All tests passed!");
77   end else begin
78     $display(" FAILURE: Found      , error_count");
79     $display(" mismatches.");
80   end
81   $display("=====");
82
83   $finish;
84 end
85
86 // -----
87 // Tasks: check_case
88 // -----
89 task check_case
90   input [N-1:0] in_A;
91   input [N-1:0] in_B;
92   input         in_Cin;
93   begin
94     A = in_A;
95     B = in_B;
96     Cin = in_Cin;
97
98     // Behavioral Golden Reference
99     expected_result = in_A + in_B + in_Cin;
100
101    #10; // Wait for logic to settle
102
103    if ((Cout, Sum) != expected_result) begin
104      error_count = error_count + 1;
105      $display("[ERROR] Time=%t | A=%h | B=%h | Cin=%b", $time, A, B, Cin);
106      $display("           Expected: %h | Got: %h", expected_result, Cout, Sum);
107    end
108  end
109 endtask
110
111 endmodule
112
```

## b) Waveform



## c) Test-Bench log

```
run all
=====
Starting Self-Checking Testbench for 64-bit PPA
=====
--- Running Directed Corner Cases ---

--- Running 1000 Random Test Vectors ---
=====
SUCCESS: All tests passed!
=====
```

## 12. IMPLEMENTATION

### a) Implementation schematic



## 13. Utilization

| Name            | Slice LUTs (63400) | Slice (15850) | LUT as Logic (63400) | Bonded IOB (210) |
|-----------------|--------------------|---------------|----------------------|------------------|
| N ppa_brentkung | 134                | 47            | 134                  | 194              |

## 14. Timing Reports

### a) setup

| Name      | Slack | Levels | Routes | High Fanout | From  | To    | Total Delay | Logic Delay | Net Delay | Requirement | Source Clock     |
|-----------|-------|--------|--------|-------------|-------|-------|-------------|-------------|-----------|-------------|------------------|
| ↳ Path 1  | ∞     | 8      | 7      | 6           | x[38] | Cout  | 22.960      | 4.619       | 18.341    | ∞           | input port clock |
| ↳ Path 2  | ∞     | 9      | 8      | 6           | x[38] | s[58] | 22.137      | 5.182       | 16.955    | ∞           | input port clock |
| ↳ Path 3  | ∞     | 9      | 8      | 6           | x[38] | s[59] | 21.805      | 4.954       | 16.851    | ∞           | input port clock |
| ↳ Path 4  | ∞     | 9      | 8      | 6           | x[38] | s[54] | 21.673      | 4.932       | 16.741    | ∞           | input port clock |
| ↳ Path 5  | ∞     | 9      | 8      | 6           | x[38] | s[63] | 21.261      | 4.662       | 16.599    | ∞           | input port clock |
| ↳ Path 6  | ∞     | 8      | 7      | 6           | x[38] | s[61] | 21.215      | 4.839       | 16.376    | ∞           | input port clock |
| ↳ Path 7  | ∞     | 8      | 7      | 6           | x[38] | s[62] | 21.174      | 4.592       | 16.583    | ∞           | input port clock |
| ↳ Path 8  | ∞     | 8      | 7      | 6           | x[38] | s[60] | 21.117      | 4.600       | 16.517    | ∞           | input port clock |
| ↳ Path 9  | ∞     | 8      | 7      | 6           | x[38] | s[57] | 21.115      | 4.828       | 16.287    | ∞           | input port clock |
| ↳ Path 10 | ∞     | 9      | 8      | 6           | x[38] | s[55] | 21.087      | 4.729       | 16.358    | ∞           | input port clock |

### b) hold

| Name      | Slack | Levels | Routes | High Fanout | From  | To    | Total Delay | Logic Delay | Net Delay | Requirement | Source Clock     |
|-----------|-------|--------|--------|-------------|-------|-------|-------------|-------------|-----------|-------------|------------------|
| ↳ Path 11 | ∞     | 3      | 2      | 4           | y[53] | s[53] | 2.322       | 1.369       | 0.953     | -∞          | input port clock |
| ↳ Path 12 | ∞     | 3      | 2      | 5           | y[55] | s[55] | 2.344       | 1.420       | 0.924     | -∞          | input port clock |
| ↳ Path 13 | ∞     | 3      | 2      | 5           | y[52] | s[52] | 2.369       | 1.384       | 0.985     | -∞          | input port clock |
| ↳ Path 14 | ∞     | 3      | 2      | 4           | y[43] | s[43] | 2.374       | 1.376       | 0.998     | -∞          | input port clock |
| ↳ Path 15 | ∞     | 3      | 2      | 4           | y[38] | s[38] | 2.377       | 1.372       | 1.005     | -∞          | input port clock |
| ↳ Path 16 | ∞     | 3      | 2      | 4           | y[50] | s[50] | 2.403       | 1.401       | 1.002     | -∞          | input port clock |
| ↳ Path 17 | ∞     | 3      | 2      | 4           | y[58] | s[58] | 2.404       | 1.417       | 0.987     | -∞          | input port clock |
| ↳ Path 18 | ∞     | 3      | 2      | 4           | y[44] | s[44] | 2.407       | 1.388       | 1.019     | -∞          | input port clock |
| ↳ Path 19 | ∞     | 3      | 2      | 6           | y[54] | s[54] | 2.415       | 1.375       | 1.040     | -∞          | input port clock |
| ↳ Path 20 | ∞     | 3      | 2      | 5           | y[56] | s[57] | 2.415       | 1.402       | 1.013     | -∞          | input port clock |