

# 高等計算機演算法

# Advanced Computer

# Algorithms

Term Projects

2024.05.29

學號:M11215075

姓名:胡劭



# 1. Design:

我選擇的是 FFT network，以下是我的設計以及說明：

## ● Algorithm

我採用的是 radix-4

### 1.16-point radix-4 FFT schematic diagram



## 2. Data represent



### ● Architecture

#### 1. Block diagram

作法: 主要可以切為主要三個區塊



## 2.Processor architecture



### ● Butterfly

#### 1. 完整設計圖

作法: 用三級 pipeline 做，透過 pipeline 跟運算因子在每個 stage 內做運算



#### 2. 加減法器直接用運算，所以只要做乘法的部分

### 3. 乘法器

利用位移方式計算乘法



#### ● SIPO

作法:用 16 顆 DFF，每一顆 34bit，利用一個  
計數器來計數，達到指定數字去找指定的 DFF  
來平行化輸出 data



## ● PISO

作法:跟 SIPO 類似，只是要讓輸出轉回串型輸出



## ● Mux

作法: 使用 flag 控制 data 選擇，counter 計數器的值來做其 Mux 功能，並透過 clk 和 rst\_n 同步和 reset 內部的計數器跟 register



## ● 完整設計圖



# 3. HDL Entry:

## ● Butterfly

### 1.multi16

Verilog code:

```
1  `module multi16(
2
3      input wire [16:0] in_17bit, // 17-bit input data
4      input wire [7:0] in_8bit, // 8-bit input data
5      output wire [16:0] out // 17-bit output data
6
7  );
8
9      wire flag; // determine the sign of the product
10     wire [16:0] in_17bit_b; // store 17-bit true form data
11     wire [7:0] in_8bit_b; // store 8-bit true form data
12     wire [24:0] mul;
13     wire [16:0] mul_b;
14     reg [24:0] neg_mul;
15
16 //***** The following are assign statements *****
17
18 assign in_17bit_b = (in_17bit[16] == 1) ? ~in_17bit[16:0] + 1'b1 : in_17bit;
19 // If in_17bit is a negative number, transform to 2's complement, otherwise remain the same.
20
21 always @ ( in_17bit or in_8bit ) begin
22     case ( in_8bit )
23         8'b00000000: neg_mul = 25'b0;
24         8'b00110001: neg_mul = ( in_17bit_b << 0 ) + ( in_17bit_b << 4 ) + ( in_17bit_b << 5 );
25         8'b11001111: neg_mul = ( in_17bit_b << 0 ) + ( in_17bit_b << 4 ) + ( in_17bit_b << 5 );
26         8'b01011010: neg_mul = ( in_17bit_b << 1 ) + ( in_17bit_b << 3 ) + ( in_17bit_b << 4 ) + ( in_17bit_b << 6 );
27         8'b10100110: neg_mul = ( in_17bit_b << 1 ) + ( in_17bit_b << 3 ) + ( in_17bit_b << 4 ) + ( in_17bit_b << 6 );
28         8'b01110010: neg_mul = ( in_17bit_b << 1 ) + ( in_17bit_b << 2 ) + ( in_17bit_b << 4 ) + ( in_17bit_b << 5 ) + ( in_17bit_b << 6 );
29         8'b10001010: neg_mul = ( in_17bit_b << 1 ) + ( in_17bit_b << 2 ) + ( in_17bit_b << 4 ) + ( in_17bit_b << 5 ) + ( in_17bit_b << 6 );
30         8'b01111111: neg_mul = ( in_17bit_b << 7 );
31         8'b10000001: neg_mul = ( in_17bit_b << 7 );
32     endcase
33 end
34
35 // assign in_8bit_b = ( in_8bit[7] == 1) ? ~in_8bit[7:0] + 1'b1 : in_8bit;
36 // If in_8bit is a negative number, transform to 2's complement, otherwise remain the same.
37 assign flag = in_17bit[16] + in_8bit[7]; // Determine the sign of the product.
38 // assign mul = in_17bit_b[16:0] * in_8bit_b[7:0]; // Calculate the absolute value of the product.
39 assign mul = neg_mul;
40 assign mul_b = {mul[23:15], mul[14:7]}; // Shift, combining integer and fractional parts.
41 assign out = (flag == 1) ? ~mul_b[16:0] + 1'b1 : mul_b; // Output only keep 17 bits.
42
43 endmodule
```

RTL:



## Testbench:

```
1  module multi16_tb( );
2
3      reg          clk_test;
4      reg          rst_n_test;
5      reg [16:0]    in_17bit_test;
6      reg [7:0]     in_8bit_test;
7      wire [16:0]   out_test;
8
9      parameter clk_freq = 10;
10
11 multi16 multi160(.in_17bit(in_17bit_test),
12 | | | | | | | | | | | | .in_8bit(in_8bit_test),
13 | | | | | | | | | | | | .out(out_test)
14 );
15
16 initial begin
17     clk_test = 0;
18     rst_n_test = 0;
19     in_17bit_test = 17'b0;
20     in_8bit_test = 8'b0;
21 end
22
23 always begin
24     #(clk_freq / 2) clk_test = ~ clk_test; // create a 100MHz clock
25     rst_n_test = 1;
26 end
27
28     initial begin
29         #10 begin
30             in_17bit_test = 17'b00001000100001000;
31             in_8bit_test  = 8'b01111111;
32         end
33
34         #100 begin
35             in_17bit_test = 17'b0000000010000000;
36             in_8bit_test  = 8'b01111111;
37         end
38
39     end
40
41 endmodule
```

## Behavioral:



## Post-Route:



## 2. Butterfly

### Verilog code:

```
1 module butterfly(
2   input wire [135:0] calc_in, // The 4 numbers which need to be calculated. Format: in4(Re,Im), in3(Re,Im), in2(Re,Im), in1(Re,Im)
3   input wire [2:0] rotation, // Number of each butterfly computation (8 in total)
4   output wire [135:0] calc_out // The 4 output numbers. Format: out4(Re,Im), out3(Re,Im), out2(Re,Im), out1(Re,Im)
5 );
6
7 parameter para0000 = 8'b00000000; // 0.0000. sin(0pi/8) = sin(8pi/8) = cos(4pi/8) = - cos(4pi/8)
8 parameter para3827 = 8'b00110001; // 0.3827. sin(1pi/8) = sin(7pi/8) = cos(3pi/8) = - cos(5pi/8) 0.0110000111110001010000010010000010110111000000001
9 parameter para3827 = 8'b11001111; // -0.3827. sin(1pi/8) = sin(7pi/8) = cos(3pi/8) = - cos(5pi/8)
10 parameter para7071 = 8'b01011010; // 0.7071. sin(2pi/8) = sin(6pi/8) = cos(2pi/8) = - cos(6pi/8) 0.1011010100000100100000101101110000000011010001110
11 parameter para7071 = 8'b10100110; // -0.7071. sin(2pi/8) = sin(6pi/8) = cos(2pi/8) = - cos(6pi/8)
12 parameter para9239 = 8'b01110110; // 0.9239. sin(3pi/8) = sin(5pi/8) = cos(1pi/8) = - cos(7pi/8) 0.11101100100001001011010111011000110001111110001
13 parameter para9239 = 8'b10001010; // -0.9239. sin(3pi/8) = sin(5pi/8) = cos(1pi/8) = - cos(7pi/8)
14 parameter para1111 = 8'b01111111; // 1.0000. sin(4pi/8) = sin(4pi/8) = cos(0pi/8) = - cos(8pi/8)
15 parameter para1111 = 8'b10000001; // -1.0000.
16
17 reg signed [7:0] rotation_factor1_real; // For input B
18 reg signed [7:0] rotation_factor1_imag; // For input B
19 reg signed [7:0] rotation_factor2_real; // For input C
20 reg signed [7:0] rotation_factor2_imag; // For input C
21 reg signed [7:0] rotation_factor3_real; // For input D
22 reg signed [7:0] rotation_factor3_imag; // For input D
23
24 wire signed [7:0] in_8bit_1_real, in_8bit_1_imag;
25 wire signed [7:0] in_8bit_2_real, in_8bit_2_imag;
26 wire signed [7:0] in_8bit_3_real, in_8bit_3_imag;
27
28 wire signed [33:0] temp_0, temp_1, temp_2, temp_3;
29
30 wire signed [16:0] temp_0_real, temp_0_imag;
31 wire signed [16:0] temp_1_real, temp_1_imag;
32 wire signed [16:0] temp_2_real, temp_2_imag;
33 wire signed [16:0] temp_3_real, temp_3_imag;
34
35 wire signed [16:0] in_17bit_0_real, in_17bit_0_imag;
36 wire signed [16:0] in_17bit_1_real, in_17bit_1_imag;
37 wire signed [16:0] in_17bit_2_real, in_17bit_2_imag;
38   wire signed [16:0] in_17bit_3_real, in_17bit_3_imag;
39
40   wire signed [16:0] out_0_real, out_0_imag;
41   wire signed [16:0] out_1_real, out_1_imag;
42   wire signed [16:0] out_2_real, out_2_imag;
43   wire signed [16:0] out_3_real, out_3_imag;
44
45   wire signed [16:0] temp_2_1_real, temp_2_2_real, temp_2_1_imag, temp_2_2_imag;
46   wire signed [16:0] temp_3_1_real, temp_3_2_real, temp_3_1_imag, temp_3_2_imag;
47   wire signed [16:0] temp_4_1_real, temp_4_2_real, temp_4_1_imag, temp_4_2_imag;
48
49 //***** The following are state machines *****
50
51 // This always part controls signal rotation_factor1_real.
52 always @ ( rotation ) begin
53   case ( rotation )
54     3'b000: rotation_factor1_real = para1111;
55     3'b001: rotation_factor1_real = para1111;
56     3'b010: rotation_factor1_real = para1111;
57     3'b011: rotation_factor1_real = para1111;
58     3'b100: rotation_factor1_real = para1111; // cos(0) = 1 W_16^0
59     3'b101: rotation_factor1_real = para9239; // cos(pi/8) = 0.9239 W_16^1
60     3'b110: rotation_factor1_real = para7071; // cos(pi/4) = 0.7071 W_16^2
61     3'b111: rotation_factor1_real = para3827; // cos(3pi/8) = 0.3827 W_16^3
62   endcase
63 end
64
65 // This always part controls signal rotation_factor1_imag.
66 always @ ( rotation ) begin
67   case ( rotation )
68     3'b000: rotation_factor1_imag = para0000;
69     3'b001: rotation_factor1_imag = para0000;
70     3'b010: rotation_factor1_imag = para0000;
71     3'b011: rotation_factor1_imag = para0000;
```

```

72      3'b100: rotation_factor1_imag = para0000; // - sin(0) = 0 W_16^0
73      3'b101: rotation_factor1_imag = parn3827; // - sin(pi/8) = - 0.3827 W_16^1
74      3'b110: rotation_factor1_imag = parn7071; // - sin(pi/4) = - 0.7071 W_16^2
75      3'b111: rotation_factor1_imag = parn9239; // - sin(3pi/8) = - 0.9239 W_16^3
76      endcase
77  end
78
79 // This always part controls signal rotation_factor2_real.
80 always @ ( rotation ) begin
81   case ( rotation )
82     3'b000: rotation_factor2_real = para1111;
83     3'b001: rotation_factor2_real = para1111;
84     3'b010: rotation_factor2_real = para1111;
85     3'b011: rotation_factor2_real = para1111;
86     3'b100: rotation_factor2_real = para1111; // cos(0) = 1 W_16^0
87     3'b101: rotation_factor2_real = parn7071; // cos(pi/4) = 0.7071 W_16^2
88     3'b110: rotation_factor2_real = para0000; // cos(pi/2) = 0 W_16^4
89     3'b111: rotation_factor2_real = parn7071; // cos(3pi/4) = - 0.7071 W_16^6
90   endcase
91 end
92
93 // This always part controls signal rotation_factor2_imag.
94 always @ ( rotation ) begin
95   case ( rotation )
96     3'b000: rotation_factor2_imag = para0000;
97     3'b001: rotation_factor2_imag = para0000;
98     3'b010: rotation_factor2_imag = para0000;
99     3'b011: rotation_factor2_imag = para0000;
100    3'b100: rotation_factor2_imag = para0000; // - sin(0) = 0 W_16^0
101    3'b101: rotation_factor2_imag = parn7071; // - sin(pi/8) = - 0.7071 W_16^2
102    3'b110: rotation_factor2_imag = para1111; // - sin(pi/4) = - 1 W_16^4
103    3'b111: rotation_factor2_imag = parn7071; // - sin(3pi/8) = - 0.7071 W_16^6
104  endcase
105 end

```

```

107 // This always part controls signal rotation_factor3_real.
108 always @ ( rotation ) begin
109   case ( rotation )
110     3'b000: rotation_factor3_real = para1111;
111     3'b001: rotation_factor3_real = para1111;
112     3'b010: rotation_factor3_real = para1111;
113     3'b011: rotation_factor3_real = para1111;
114     3'b100: rotation_factor3_real = para1111; // cos(0)      = 1      W_16^0
115     3'b101: rotation_factor3_real = para3827; // cos(3pi/8) = 0.7071 W_16^3
116     3'b110: rotation_factor3_real = parn7071; // cos(6pi/8) = 0      W_16^6
117     3'b111: rotation_factor3_real = parn9239; // cos(9pi/8) = -0.7071 W_16^9
118   endcase
119 end
120
121 // This always part controls signal rotation_factor3_imag.
122 always @ ( rotation ) begin
123   case ( rotation )
124     3'b000: rotation_factor3_imag = para0000;
125     3'b001: rotation_factor3_imag = para0000;
126     3'b010: rotation_factor3_imag = para0000;
127     3'b011: rotation_factor3_imag = para0000;
128     3'b100: rotation_factor3_imag = para0000; // -sin(0)      = -0      W_16^0
129     3'b101: rotation_factor3_imag = parn9239; // -sin(3pi/8) = -0.7071 W_16^3
130     3'b110: rotation_factor3_imag = parn7071; // -sin(6pi/8) = -0.1    W_16^6
131     3'b111: rotation_factor3_imag = para3827; // -sin(9pi/8) = 0.7071 W_16^9
132   endcase
133 end
134
135 //***** The following are instantiations *****
136
137 multi16 multiBRR( .in_17bit(in_17bit_1_real),
138   .in_8bit(in_8bit_1_real),
139   .out(temp_2_1_real)
140 );

```

```

141
142     multi16 multiBII( .in_17bit(in_17bit_1_imag),
143                         .in_8bit(in_8bit_1_imag),
144                         .out(temp_2_2_real)
145                     );
146
147     multi16 multiBRI( .in_17bit(in_17bit_1_real),
148                         .in_8bit(in_8bit_1_imag),
149                         .out(temp_2_1_imag)
150                     );
151
152     multi16 multiBIR( .in_17bit(in_17bit_1_imag),
153                         .in_8bit(in_8bit_1_real),
154                         .out(temp_2_2_imag)
155                     );
156
157 //*****
158
159     multi16 multiCRR( .in_17bit(in_17bit_2_real),
160                         .in_8bit(in_8bit_2_real),
161                         .out(temp_3_1_real)
162                     );
163
164     multi16 multiCII( .in_17bit(in_17bit_2_imag),
165                         .in_8bit(in_8bit_2_imag),
166                         .out(temp_3_2_real)
167                     );
168
169     multi16 multiCRI( .in_17bit(in_17bit_2_real),
170                         .in_8bit(in_8bit_2_imag),
171                         .out(temp_3_1_imag)
172                     );
173

174     multi16 multiCIR( .in_17bit(in_17bit_2_imag),
175                         .in_8bit(in_8bit_2_real),
176                         .out(temp_3_2_imag)
177                     );
178
179 //*****
180
181     multi16 multiDRR( .in_17bit(in_17bit_3_real),
182                         .in_8bit(in_8bit_3_real),
183                         .out(temp_4_1_real)
184                     );
185
186     multi16 multiDII( .in_17bit(in_17bit_3_imag),
187                         .in_8bit(in_8bit_3_imag),
188                         .out(temp_4_2_real)
189                     );
190
191     multi16 multiDRI( .in_17bit(in_17bit_3_real),
192                         .in_8bit(in_8bit_3_imag),
193                         .out(temp_4_1_imag)
194                     );
195
196     multi16 multiDIR( .in_17bit(in_17bit_3_imag),
197                         .in_8bit(in_8bit_3_real),
198                         .out(temp_4_2_imag)
199                     );
200
201 //***** The following are assign statements *****
202
203     assign in_17bit_0_real = calc_in[33:17];           // A (real)
204     assign in_17bit_0_imag = calc_in[16:0];           // A (imag)
205
206     assign in_8bit_1_real = rotation_factor1_real;   // rotation factor for B (real)

```

```

207 assign in_8bit_1_imag = rotation_factor1_imag; // rotation factor for B (imag)
208 assign in_17bit_1_real = calc_in[67:51]; // B (real)
209 assign in_17bit_1_imag = calc_in[50:34]; // B (imag)
210
211 assign in_8bit_2_real = rotation_factor2_real; // rotation factor for C (real)
212 assign in_8bit_2_imag = rotation_factor2_imag; // rotation factor for C (imag)
213 assign in_17bit_2_real = calc_in[101:85]; // C (real)
214 assign in_17bit_2_imag = calc_in[84:68]; // C (imag)
215
216 assign in_8bit_3_real = rotation_factor3_real; // rotation factor for D (real)
217 assign in_8bit_3_imag = rotation_factor3_imag; // rotation factor for D (imag)
218 assign in_17bit_3_real = calc_in[135:119]; // D (real)
219 assign in_17bit_3_imag = calc_in[118:102]; // D (imag)
220
221 //*****
222
223 assign temp_0_real = {in_17bit_0_real[16], in_17bit_0_real}; // A real
224 assign temp_0_imag = {in_17bit_0_imag[16], in_17bit_0_imag}; // A imag
225
226 assign temp_1_real = temp_2_1_real - temp_2_2_real; //B * W^P real
227 assign temp_1_imag = temp_2_1_imag + temp_2_2_imag; //B * W^P imag
228
229 assign temp_2_real = temp_3_1_real - temp_3_2_real; //C * W^{2P} real
230 assign temp_2_imag = temp_3_1_imag + temp_3_2_imag; //C * W^{2P} imag
231
232 assign temp_3_real = temp_4_1_real - temp_4_2_real; //D * W^{3P} real
233 assign temp_3_imag = temp_4_1_imag + temp_4_2_imag; //D * W^{3P} imag
234
235
236 assign temp_0 = {temp_0_real,temp_0_imag}; // A
237 assign temp_1 = {temp_1_real, temp_1_imag}; // B * Wp
238 assign temp_2 = {temp_2_real, temp_2_imag}; // C * W2p
239 assign temp_3 = {temp_3_real, temp_3_imag}; // D * W3p

243 assign out_0_real = temp_0_real + temp_1_real + temp_2_real + temp_3_real;
244 assign out_0_imag = temp_0_imag + temp_1_imag + temp_2_imag + temp_3_imag;
245
246 assign out_1_real = temp_0_real - temp_1_real + temp_2_real - temp_3_real;
247 assign out_1_imag = temp_0_imag - temp_1_imag + temp_2_imag - temp_3_imag;
248
249 assign out_2_real = temp_0_real - temp_1_imag - temp_2_real + temp_3_imag;
250 assign out_2_imag = temp_0_imag + temp_1_real - temp_2_imag - temp_3_real;
251 assign out_3_real = temp_0_real + temp_1_imag - temp_2_real - temp_3_imag;
252 assign out_3_imag = temp_0_imag - temp_1_real - temp_2_imag + temp_3_real;
253
254 assign calc_out = { out_2_real, out_2_imag,
255 | | | | out_1_real, out_1_imag,
256 | | | | out_3_real, out_3_imag,
257 | | | | out_0_real, out_0_imag };
258
259 endmodule

```

HDL:



Testbench:

```
1  module butterfly_tb();
2
3    reg          clk_test;
4    reg          rst_n_test;
5    reg [135:0] calc_in_test;
6    reg [2:0]   rotation_test;
7    wire [135:0] calc_out_test;
8
9    reg [135:0] temp1, temp2, temp3, temp4;
10
11   parameter clk_freq = 10; // or 8 for 125 MHz
12
13   butterfly butterfly0(.calc_in(calc_in_test),
14                         .rotation(rotation_test),
15                         .calc_out(calc_out_test)
16 );
17
18   initial begin
19     clk_test = 0;
20     rst_n_test = 0;
21     calc_in_test = 135'b0;
22     rotation_test = 3'b0;
23   end
24
25   always begin
26     #(clk_freq / 2) clk_test = ~ clk_test; // create a 100 MHz clock
27     rst_n_test = 1; // enable
28   end
```

```

30 |  initial begin
31 | //***** The following is the 16 inputs *****
32 | $display("\n\nLoad Data\n");
33 | #10 begin
34 |   calc_in_test = 136'b000000001000000000000000000000001000000000000000000000000000000;
35 |   //          | x[12]Real || x[12]Imag || x[8]Real || x[8]Imag
36 |   rotation_test = 3'b000;
37 | end
38 |
39 | #10 temp1 = calc_out_test;
40 |
41 | #10 begin
42 |   calc_in_test = 136'b000000001000000000000000000000001000000000000000000000000000000;
43 |   //          | x[13]Real || x[13]Imag || x[9]Real || x[9]Imag
44 |   rotation_test = 3'b001;
45 | end
46 |
47 | #10 temp2 = calc_out_test;
48 |
49 | #10 begin
50 |   calc_in_test = 136'b000000001000000000000000000000001000000000000000000000000000000;
51 |   //          | x[14]Real || x[14]Imag || x[10]Real || x[10]Imag
52 |   rotation_test = 3'b010;
53 | end

1000000001100000000000000000000000000000000000000000000000000000000000000000000000000000;
||  x[4]Real || x[4]Imag || x[0]Real || x[0]Imag |

1000000001000000000000000000000000000000000000000000000000000000000000000000000000000000;
||  x[5]Real || x[5]Imag || x[1]Real || x[1]Imag |

100000000100000000000000000000000000000000000000000000000000000000000000000000000000000000;
||  x[6]Real || x[6]Imag || x[2]Real || x[2]Imag |

```



```

80  v #10 begin
81    | calc_in_test = {temp4[67:34], temp3[67:34], temp2[67:34], temp1[67:34]};
82    | rotation_test = 3'b101;
83  end
84
85  v #10 begin
86    | $display("[1] %b_%b_%b %b_%b_%b\n", calc_out_test[33], calc_out_test[32:25],
87    | $display("[5] %b_%b_%b %b_%b_%b\n", calc_out_test[67], calc_out_test[66:59],
88    | $display("[9] %b_%b_%b %b_%b_%b\n", calc_out_test[101], calc_out_test[100:93],
89    | $display("[13] %b_%b_%b %b_%b_%b\n", calc_out_test[135], calc_out_test[134:127],
90  end
91
92  v #10 begin
93    | calc_in_test = {temp4[101:68], temp3[101:68], temp2[101:68], temp1[101:68]};
94    | rotation_test = 3'b110;
95  end
96
97  v #10 begin
98    | $display("[2] %b_%b_%b %b_%b_%b\n", calc_out_test[33], calc_out_test[32:25],
99    | $display("[6] %b_%b_%b %b_%b_%b\n", calc_out_test[67], calc_out_test[66:59],
100   | $display("[10] %b_%b_%b %b_%b_%b\n", calc_out_test[101], calc_out_test[100:93],
101   | $display("[14] %b_%b_%b %b_%b_%b\n", calc_out_test[135], calc_out_test[134:127],
102  end

calc_out_test[24:17], calc_out_test[16], calc_out_test[15:8], calc_out_test[7:0]);
calc_out_test[58:51], calc_out_test[50], calc_out_test[49:42], calc_out_test[41:34]);
calc_out_test[92:85], calc_out_test[84], calc_out_test[83:76], calc_out_test[75:68]);
calc_out_test[126:119], calc_out_test[118], calc_out_test[117:110], calc_out_test[109:102]);

calc_out_test[24:17], calc_out_test[16], calc_out_test[15:8], calc_out_test[7:0]);
calc_out_test[58:51], calc_out_test[50], calc_out_test[49:42], calc_out_test[41:34]);
calc_out_test[92:85], calc_out_test[84], calc_out_test[83:76], calc_out_test[75:68]);
calc_out_test[126:119], calc_out_test[118], calc_out_test[117:110], calc_out_test[109:102]);

```

```

104      #10 begin
105          calc_in_test = {temp4[135:102], temp3[135:102], temp2[135:102], temp1[135:102]};
106          rotation_test = 3'b111;
107      end
108
109      #10 begin
110          $display("[3] %b_%b_%b %b_%b_%b\n", calc_out_test[33], calc_out_test[32:25],
111          $display("[7] %b_%b_%b %b_%b_%b\n", calc_out_test[67], calc_out_test[66:59],
112          $display("[11] %b_%b_%b %b_%b_%b\n", calc_out_test[101], calc_out_test[100:93],
113          $display("[15] %b_%b_%b %b_%b_%b\n", calc_out_test[135], calc_out_test[134:127],
114          end
115
116      #100 $stop;
117
118  end
119
120 endmodule

calc_out_test[24:17], calc_out_test[16], calc_out_test[15:8], calc_out_test[7:0]);
calc_out_test[58:51], calc_out_test[50], calc_out_test[49:42], calc_out_test[41:34]);
calc_out_test[92:85], calc_out_test[84], calc_out_test[83:76], calc_out_test[75:68]);
calc_out_test[126:119], calc_out_test[118], calc_out_test[117:110], calc_out_test[109:102]);

```

## Behavioral:



## Post-Route:



## ● SIPO

### Verilog code:

```

1  module s_p(
2    input wire clk,
3    input wire rst_n,
4    input wire [33:0] data_in_1,
5    output reg [135:0] data_out_1,
6    output reg s_p_flag_out
7  );
8  reg [33:0] R0;
9  reg [33:0] R1;
10 reg [33:0] R2;
11 reg [33:0] R3;
12 reg [33:0] R4;
13 reg [33:0] R5;
14 reg [33:0] R6;
15 reg [33:0] R7;
16 reg [33:0] R8;
17 reg [33:0] R9;
18 reg [33:0] R10;
19 reg [33:0] R11;
20 reg [33:0] R12;
21 reg [33:0] R13;
22 reg [33:0] R14;
23 reg [33:0] R15;

```

```

25     reg [3:0] counter;
26
27     reg s_p_flag_mux;
28 // This always part controls signal counter.
29     always @ ( posedge clk or negedge rst_n ) begin
30         if ( !rst_n )
31             counter <= 4'b0;
32         else if (counter == 4'b1111)
33             counter <= 4'b0;
34         else
35             counter <= counter + 4'b0001;
36     end
37
38 // This always part controls signal s_p_flag_out.
39     always @ ( posedge clk or negedge rst_n ) begin
40         if ( !rst_n )
41             s_p_flag_out <= 0;
42
43         else if (counter == 4'b1100)
44             s_p_flag_out <= 1'b1;
45         else
46             s_p_flag_out <= 1'b0;
47     end
48
49 // This always part controls signal s_p_flag_mux.
50     always @ ( posedge clk or negedge rst_n ) begin
51         if ( !rst_n )
52             s_p_flag_mux <= 0;
53         else begin
54             case ( counter )
55                 4'b0000: s_p_flag_mux <= 1'b1;
56                 4'b1101: s_p_flag_mux <= 1'b1;
57                 4'b1110: s_p_flag_mux <= 1'b1;
58                 4'b1111: s_p_flag_mux <= 1'b1;
59                 default: s_p_flag_mux <= 1'b0;

```

---

```

60     |   endcase
61   end
62 end
63
64 // This always part controls signal data_in_1. or negedge rst_n
65
66 // This always part controls signal data_out_1. or posedge clk
67 begin
68   case ( counter )
69     4'b0000: data_out_1 = {R15, R11, R7, R3};
70     4'b1101: data_out_1 = {R12, R8, R4, R0};
71     4'b1110: data_out_1 = {R13, R9, R5, R1};
72     4'b1111: data_out_1 = {R14, R10, R6, R2};
73   endcase
74 end
75
76 // This always part controls signal data_in_1. or posedge clk
77 begin
78   case(counter)
79     4'b0000: R0 <= data_in_1;
80     4'b0001: R1 <= data_in_1;
81     4'b0010: R2 <= data_in_1;
82     4'b0011: R3 <= data_in_1;
83     4'b0100: R4 <= data_in_1;
84     4'b0101: R5 <= data_in_1;
85     4'b0110: R6 <= data_in_1;
86     4'b0111: R7 <= data_in_1;
87     4'b1000: R8 <= data_in_1;
88     4'b1001: R9 <= data_in_1;
89     4'b1010: R10 <= data_in_1;
90     4'b1011: R11 <= data_in_1;
91     4'b1100: R12 <= data_in_1;
92     4'b1101: R13 <= data_in_1;
93     4'b1110: R14 <= data_in_1;
94     4'b1111: R15 <= data_in_1;
95   endcase
96 end
97 endmodule

```

HDL:



Testbench:

```
24  module s_p_tb;
25
26  // Inputs
27  reg clk;
28  reg rst_n;
29  reg [33:0] data_in_1;
30
31  // Outputs
32  wire [135:0] data_out_1;
33  wire s_p_flag_out;
34
35  // Instantiate the Unit Under Test (UUT)
36  s_p uut (
37    .clk(clk),
38    .rst_n(rst_n),
39    .data_in_1(data_in_1),
40    .data_out_1(data_out_1),
41    .s_p_flag_out(s_p_flag_out)
42 );
```

```

44  initial begin
45      // Initialize Inputs
46      clk = 0;
47      rst_n = 0;
48      data_in_1 = 0;
49
50      // Wait 100 ns for global reset to finish
51      #100;
52
53      // Reset sequence
54      rst_n = 1;
55      #10;
56      rst_n = 0;
57      #10;
58      rst_n = 1;
59
60      // Clocking and data input
61      repeat (16) begin
62          // Toggle clock
63          clk = ~clk;
64          #5;
65
66          // Assign data_in_1 with some test data
67          data_in_1 = $random;
68
69          // Display input and output for debugging
70          $display("Time=%t, clk=%b, rst_n=%b, data_in_1=%h",
71          end
72
73      // End simulation
74      $finish;
75  end
76
77  endmodule

data_out_1=%h, s_p_flag_out=%b", $time, clk, rst_n, data_in_1, data_out_1, s_p_flag_out);

```

## Behavioral:



## Post-Route:



## ● PISO

Verilog code:

```
1  module p_s(
2      input wire clk,
3      input wire rst_n,
4      input wire [135:0] data_in_3,
5      input wire p_s_flag_in,
6      output reg [33:0] data_out_3
7  );
8      reg [33:0] R0;
9      reg [33:0] R1;
10     reg [33:0] R2;
11     reg [33:0] R3;
12     reg [33:0] R4;
13     reg [33:0] R5;
14     reg [33:0] R6;
15     reg [33:0] R7;
16     reg [33:0] R8;
17     reg [33:0] R9;
18     reg [33:0] R10;
19     reg [33:0] R11;
20     reg [33:0] R12;
21     reg [33:0] R13;
22     reg [33:0] R14;
23     reg [33:0] R15;
24
25     reg [1:0] counter_1;
26
27     reg [3:0] counter_2;
28
29     reg p_s_flag_out;
30
31 // This always part controls signal counter_1.
32
33     always @ ( posedge clk or negedge rst_n ) begin
34
35         if ( !rst_n )
36             | counter_1 <= 2'b0;
37         else if ( counter_1 == 2'b11 )
38             | counter_1 <= 2'b0;
39         else
40             | counter_1 <= counter_1 + 2'b01;
41     end
42
```

```

43 // This always part controls registers. 60
44   always @ ( posedge clk ) begin 61
45     if ( !p_s_flag_in ) begin 62
46       case ( counter_1 ) 63
47         2'b10: begin 64
48           R0 <= data_in_3[33:0]; 65
49           R4 <= data_in_3[67:34]; 66
50           R8 <= data_in_3[101:68]; 67
51           R12 <= data_in_3[135:102]; 68
52         end 69
53         2'b11: begin 70
54           R1 <= data_in_3[33:0]; 71
55           R5 <= data_in_3[67:34]; 72
56           R9 <= data_in_3[101:68]; 73
57           R13 <= data_in_3[135:102]; 74
58         end
59       end
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76 // This always part controls signal counter_2.
77
78   always @ ( posedge clk or negedge rst_n ) begin
79
80     if ( !rst_n )
81       counter_2 <= 4'b0;
82     else if ( counter_2 == 4'b1111 )
83       counter_2 <= 4'b0;
84     else
85       counter_2 <= counter_2 + 4'b0001;
86   end
87
88 // This always part controls p_s_flag_out.
89
90   always @ ( posedge clk or negedge rst_n ) begin
91     if ( !rst_n )
92       p_s_flag_out <= 1'b0;
93     else if ( !p_s_flag_in )
94       p_s_flag_out <= 1'b1;
95     else
96       p_s_flag_out <= p_s_flag_out;
97   end
98
99
100
101  always @ ( posedge clk ) begin
102    if ( p_s_flag_out ) begin
103      case( counter_2 )
104        4'b0000: data_out_3 <= R7;
105        4'b0001: data_out_3 <= R11;
106        4'b0010: data_out_3 <= R15;
107        4'b0011: data_out_3 <= R0;
108        4'b0100: data_out_3 <= R4;
109        4'b0101: data_out_3 <= R8;
110        4'b0110: data_out_3 <= R12;
111        4'b0111: data_out_3 <= R1;
112        4'b1000: data_out_3 <= R5;
113        4'b1001: data_out_3 <= R9;
114        4'b1010: data_out_3 <= R13;
115        4'b1011: data_out_3 <= R2;
116        4'b1100: data_out_3 <= R6;
117        4'b1101: data_out_3 <= R10;
118        4'b1110: data_out_3 <= R14;
119        4'b1111: data_out_3 <= R3;
120      endcase
121    end
122  end

```

HDL:



Testbench:

```
25  module p_s_tb;
26
27      // Inputs
28      reg clk;
29      reg rst_n;
30      reg [135:0] data_in_3;
31      reg p_s_flag_in;
32
33      // Outputs
34      wire [33:0] data_out_3;
35
36      // Instantiate the Unit Under Test (UUT)
37      p_s uut (
38          .clk(clk),
39          .rst_n(rst_n),
40          .data_in_3(data_in_3),
41          .p_s_flag_in(p_s_flag_in),
42          .data_out_3(data_out_3)
43      );
```

```

45      initial begin
46          // Initialize Inputs
47          clk = 0;
48          rst_n = 0;
49          data_in_3 = 0;
50          p_s_flag_in = 0;
51
52          // Wait 100 ns for global reset to finish
53          #100;
54
55          // Add stimulus here
56          // Reset sequence
57          rst_n = 1;
58          #10;
59          rst_n = 0;
60          #10;
61          rst_n = 1;
62
63
64      repeat (16) begin
65          // Toggle clock
66          clk = ~clk;
67          #5;
68
69          // Assign data_in_3 with some test data
70          data_in_3 = $random;
71
72          // Toggle p_s_flag_in every other clock cycle
73          p_s_flag_in = clk;
74
75          // Display input and output for debugging
76          $display("Time=%t, clk=%b, rst_n=%b, data_in_3=%h, p_s_flag_in=%b",
77      end
78
79      // End simulation
80      $finish;
81  end
82
83 endmodule

```

data\_out\_3=%h", \$time, clk, rst\_n, data\_in\_3, p\_s\_flag\_in, data\_out\_3);

## Behavioral:



## Post-Route:



## ● ctrl

Verilog code:

```
1  < module ctrl(
2    input  wire      clk,
3    input  wire      rst_n,
4    input  wire      s_p_flag_in,
5    output reg       mux_flag,
6    output reg [2:0] rotation,
7    output reg       demux_flag
8  );
9
10   parameter STOP = 3'b0;
11   parameter MUX_IDLE = 1'b0;
12   parameter ROT_IDLE = 3'b0;
13   parameter DEMUX_IDLE = 1'b0;
14   parameter S_P_SEL_0 = 1'b0;
15   parameter S_P_SEL_1 = 1'b0;
16   parameter S_P_SEL_2 = 1'b0;
17   parameter S_P_SEL_3 = 1'b0;
18   parameter REG_SEL_0 = 1'b1;
19   parameter REG_SEL_1 = 1'b1;
20   parameter REG_SEL_2 = 1'b1;
21   parameter REG_SEL_3 = 1'b1;
22   parameter P_S_SEL_0 = 1'b0;
23   parameter P_S_SEL_1 = 1'b0;
```

```

23     parameter P_S_SEL_1 = 1'b0;
24     parameter P_S_SEL_2 = 1'b0;
25     parameter P_S_SEL_3 = 1'b0;
26
27     reg[2:0] core_tick; // count how many sequential logic
28
29 // This always part controls signal core_tick.
30     always @ ( posedge clk or negedge rst_n) begin
31         if ( !rst_n )
32             | core_tick <= STOP; // 3 bits
33         else if ( core_tick == STOP ) begin
34             case ( s_p_flag_in )
35                 1'b0: core_tick <= STOP;
36                 1'b1: core_tick <= core_tick + 1;
37             endcase
38         end else
39             core_tick <= core_tick + 1;
40     end
41
42 // This always part controls signal mux_flag.
43     always @ ( posedge clk or negedge rst_n ) begin
44         if ( !rst_n )
45             | mux_flag <= MUX_IDLE;
46         else begin
47             case ( core_tick )
48                 3'b000: mux_flag <= S_P_SEL_0; // Com
49                 3'b001: mux_flag <= S_P_SEL_1; // 15
50                 3'b010: mux_flag <= S_P_SEL_2; // 16
51                 3'b011: mux_flag <= S_P_SEL_3; // 17
52                 3'b100: mux_flag <= REG_SEL_0; // 18
53                 3'b101: mux_flag <= REG_SEL_1; // 19
54                 3'b110: mux_flag <= REG_SEL_2; // 20
55                 3'b111: mux_flag <= REG_SEL_3; // 21
56             // If MUX/DEMUX has 3 OPTIONS (SP/PS,
57             endcase
58         end
59     end
60
61 // This always part controls signal demux_flag
62     always @ ( posedge clk or negedge rst_n ) be
63         if ( !rst_n )
64             | demux_flag <= DEMUX_IDLE;
65         else begin
66             | case ( core_tick )

```

```

67      3'b000: demux_flag <= REG_SEL_0; // Cor
68      3'b001: demux_flag <= REG_SEL_1; // 15
69      3'b010: demux_flag <= REG_SEL_2; // 16
70      3'b011: demux_flag <= REG_SEL_3; // 17
71      3'b100: demux_flag <= P_S_SEL_0; // 18
72      3'b101: demux_flag <= P_S_SEL_1; // 19
73      3'b110: demux_flag <= P_S_SEL_2; // 20
74      3'b111: demux_flag <= P_S_SEL_3; // 21
75      // If MUX/DEMUX has 3 OPTIONS (SP/PS, RI
76      endcase
77  end
78 end
79
80 // This always part controls signal rotation.
81 always @ ( posedge clk or negedge rst_n ) begin
82   if ( !rst_n )
83     rotation <= ROT_IDLE;
84   else
85     rotation <=core_tick;
86 end
87
88 endmodule

```

HDL:



## Testbench:

```
25  module counter_tb;
26
27  // Inputs
28  reg clk;
29  reg rst_n;
30
31  // Outputs
32  wire [3:0] counter;
33
34  // Instantiate the Unit Under Test (UUT)
35  counter uut (
36      .clk(clk),
37      .rst_n(rst_n),
38      .counter(counter)
39 );
40
41  initial begin
42      // Initialize Inputs
43      clk = 0;
44      rst_n = 0;
45
46      #100;
47
48      // Reset sequence
49      rst_n = 1;
50      #10;
51      rst_n = 0;
52      #10;
53      rst_n = 1;
54
55      // Test counter increment
56      repeat (20) begin
57          #10;
58          $display("Time=%t, counter=%d", $time, counter);
59      end
60
61      // End simulation
62      $finish;
63  end
64
65
66  // Clock generation
67  always #5 clk = ~clk;
```

## Behavioral:



## Post-Route:



## ● Buffer storage

Verilog code:

```

1  module reg1(
2    input wire clk,
3    input wire rst_n,
4    input wire [135:0] data_in_2,
5    input wire reg_datain_flag,
6    output reg [135:0] data_out_2
7  );
8  reg reg_flag_mux; // control
9
10 reg [33:0] R0;
11 reg [33:0] R1;
12 reg [33:0] R2;
13 reg [33:0] R3;
14 reg [33:0] R4;
15 reg [33:0] R5;
16 reg [33:0] R6;
17 reg [33:0] R7;
18 reg [33:0] R8;
19 reg [33:0] R9;
20 reg [33:0] R10;
21 reg [33:0] R11;
22 reg [33:0] R12;
23 reg [33:0] R13;
24
25
26
27
28
29
30 // This always part controls signal
31 always @ ( posedge clk ) begin
32   if ( !rst_n )
33     counter1 <= 2'b00;
34   else if ( reg_datain_flag )
35     counter1 <= counter1 + 1'b1;
36 end
37
38 // This always part controls registers
39 always @ ( posedge clk ) begin
40   if ( reg_datain_flag )begin
41     case ( counter1 )
42       2'b01: begin
43         R0 <= data_in_2[33:0];
44         R1 <= data_in_2[67:34];
45         R2 <= data_in_2[101:68];
46
47       end
48       2'b10: begin
49         R4 <= data_in_2[33:0];
50         R5 <= data_in_2[67:34];
51         R6 <= data_in_2[101:68];
52         R7 <= data_in_2[135:102];
53       end
54       2'b11: begin
55         R8 <= data_in_2[33:0];
56         R9 <= data_in_2[67:34];
57         R10 <= data_in_2[101:68];
58         R11 <= data_in_2[135:102];
59       end
60       2'b00: begin
61         R12 <= data_in_2[33:0];
62         R13 <= data_in_2[67:34];
63         R14 <= data_in_2[101:68];
64         R15 <= data_in_2[135:102];
65       end
66     endcase
67   end
68
69
70
71   always @ ( posedge clk ) begin
72     if ( !rst_n )
73       reg_flag_mux <= 1'b0;
74     else if ( counter1 == 2'b11 )
75       reg_flag_mux <= 1'b1;
76     else if ( counter2 == 2'b11 )
77       reg_flag_mux <= 1'b0;
78   end
79
80   // This always part controls signal
81   always @ ( posedge clk ) begin
82     if ( !rst_n )
83       counter2 <= 2'b00;
84     else if ( reg_flag_mux )
85       counter2 <= counter2 + 2'b01;
86   end
87

```

```

89      always @ ( posedge clk ) begin
90          if ( reg_flag_mux ) begin
91              case ( counter2 )
92                  2'b00: data_out_2 <= {R12, R8, R4, R0};
93                  2'b01: data_out_2 <= {R13, R9, R5, R1};
94                  2'b10: data_out_2 <= {R14, R10, R6, R2};
95                  2'b11: data_out_2 <= {R15, R11, R7, R3};
96              endcase
97          end
98      end
99
100     endmodule

```

HDL:



## Testbench:

```

45   initial begin
46     // Initialize Inputs
47     clk = 0;
48     rst_n = 0;
49     data_in_2 = 0;
50     reg_datain_flag = 0;
51
52     // Wait 100 ns for g:
53     #10;
54
55     // Reset sequence
56     rst_n = 1;
57     #10;
58     rst_n = 0;
59     #10;
60     rst_n = 1;
61
62     // Test case 1: Check
63     reg_datain_flag = 1;
64     data_in_2 = 136'h0123456789ABCDEF;
65     #10;
66
67     data_in_2 = 136'hfedcba9876543210fedcba9876543210;
68     #10;
69     data_in_2 = 136'h0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f;
70     #10;
71     data_in_2 = 136'hf0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0;
72     #10;
73     reg_datain_flag = 0;
74
75     // Check output data after loading
76     #40;
77     $display("Output data: %h", data_out_2);
78
79     // End simulation
80     $finish;
81   end
82
83   always #5 clk = ~clk;
84
85   endmodule

```

## Behavioral:



## Post-Route:



## ● Mux

### 1. Smux

Verilog code:

```
1  ^ module smux(
2      input mux_flag,
3      input [135:0] data_in_1,
4      input [135:0] data_in_2,
5      input [135:0] data_in_3,
6      input [3:0] scounter,
7      output reg [135:0] data_out
8  );
9  ^ reg [33:0] R1, R2, R3, R4;
10 ^ always @(*) begin
11    case(scounter)
12      2: data_out <= {R4, R3, R2, R1};
13      14: R1 <= data_in_3[33:0];
14      15: R2 <= data_in_3[33:0];
15      0: R3 <= data_in_3[33:0];
16      1: R4 <= data_in_3[33:0];
17      default: data_out <= data_out;
18    endcase
19    data_out <= mux_flag ? data_in_1 : data_in_2;
20  end
21
22 endmodule
```

HDL:



Testbench:

```
25  module smux_tb;
26
27  // Inputs
28  reg mux_flag;
29  reg [135:0] data_in_1;
30  reg [135:0] data_in_2;
31  reg [135:0] data_in_3;
32  reg [3:0] counter;
33
34  // Outputs
35  wire [135:0] data_out;
36
37  // Instantiate the Unit Under Test (UUT)
38  smux uut (
39    .mux_flag(mux_flag),
40    .data_in_1(data_in_1),
41    .data_in_2(data_in_2),
42    .data_in_3(data_in_3),
43    .counter(counter),
44    .data_out(data_out)
45  );
```

```

47  initial begin
48      // Initialize Inputs
49      mux_flag = 0;
50      data_in_1 = 0;
51      data_in_2 = 0;
52      data_in_3 = 0;
53      counter = 0;
54
55      // Wait 100 ns for global reset to finish
56      #100;
57
58      // Test case 1: mux_flag = 0, counter = 2
59      mux_flag = 0;
60      data_in_1 = 136'h0123456789abcdef0123456789abcdef;
61      data_in_2 = 136'hfedcba9876543210fedcba9876543210;
62      data_in_3 = 136'h0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f;
63      counter = 2;
64      #10;
65      $display("Test case 1: mux_flag = 0, counter = 2, data_out = %h", data_out);
66
67      // Test case 2: mux_flag = 0, counter = 14
68      counter = 14;
69      #10;
70      $display("Test case 2: mux_flag = 0, counter = 14, data_out = %h", data_out);
71
72      // Test case 3: mux_flag = 1, counter = 2
73      mux_flag = 1;
74      counter = 2;
75
75      #10;
76      $display("Test case 3: mux_flag = 1, counter = 2, data_out = %h", data_out);
77
78      // Test case 4: mux_flag = 1, counter = 15
79      counter = 15;
80      #10;
81      $display("Test case 4: mux_flag = 1, counter = 15, data_out = %h", data_out);
82
83      // End simulation
84      $finish;
85  end
86
87  endmodule

```

## Behavioral:



## Post-Route:



## 2. Counter

Verilog code:

```
1  module counter(
2      input clk,
3      input rst_n,
4      output reg [3:0] ccounter
5  );
6
7  always @(*) begin
8      if(!rst_n)
9          ccounter <= 0;
10     else
11         ccounter <= ccounter + 1;
12 end
13
14 endmodule
```

HDL:



Testbench:

```

25  module counter_tb;
26
27  // Inputs
28  reg clk;
29  reg rst_n;
30
31  // Outputs
32  wire [3:0] counter;
33
34  // Instantiate the Un
35  counter uut (
36    .clk(clk),
37    .rst_n(rst_n),
38    .counter(counter)
39 );
40
41 initial begin
42   // Initialize Inp
43   clk = 0;
44   rst_n = 0;
45
46   // Wait 100 ns fo
47   #100;
48
49   // Reset sequence
50   rst_n = 1;
51   #10;
52   rst_n = 0;
53
54   #10;
55   rst_n = 1;
56
57   // Test counter increment
58   repeat (20) begin
59     #10;
60     $display("Time=%t, counter=%d", $time, counter);
61   end
62
63   // End simulation
64   $finish;
65
66   // Clock generation
67   always #5 clk = ~clk;
68
69 endmodule

```

## Behavioral:



## Post-Route:



### 3. Mux

Verilog code:

```
21  module mux(
22      input mux_flag,
23      input clk,
24      input rst_n,
25      input [135:0] data_in_1,
26      input [135:0] data_in_2,
27      input [135:0] data_in_3,
28      output [135:0] data_out
29 );
30
31 wire [3:0] cscounter;
32
33 counter counter_inst (
34     .clk(clk),
35     .rst_n(rst_n),
36     .ccounter(cscounter)
37 );
38
39
40 smux mux_inst (
41     .mux_flag(mux_flag),
42     .data_in_1(data_in_1),
43     .data_in_2(data_in_2),
44     .data_in_3(data_in_3),
45     .scounter(cscounter),
46     .data_out(data_out)
47 );
```

## HDL:



## Testbench:

```
1  `timescale 1ns / 1ps
2
3  module mux_tb;
4
5      // Inputs
6      reg mux_flag_tb;
7      reg clk_tb;
8      reg rst_n_tb;
9      reg [135:0] data_in_1_tb;
10     reg [135:0] data_in_2_tb;
11     reg [135:0] data_in_3_tb;
12
13    // Outputs
14    wire [135:0] data_out_tb;
15
16    // Instantiate the Unit Under
17    mux uut (
18        .mux_flag(mux_flag_tb),
19        .clk(clk_tb),
20        .rst_n(rst_n_tb),
21        .data_in_1(data_in_1_tb),
22        .data_in_2(data_in_2_tb),
23        .data_in_3(data_in_3_tb),
24        .data_out(data_out_tb)
25    );
26
27    // Clock generation
28    always begin
29        clk_tb = 0;
30        #5;
31        clk_tb = 1;
32        #5;
33    end
34
35    // Reset generation
36    initial begin
37        rst_n_tb = 0;
38        #10;
39        rst_n_tb = 1;
40        #10;
41        rst_n_tb = 0;
42        #10;
43        rst_n_tb = 1;
44        #10;
45        $finish;
46    end
```

```

49     initial begin
50         mux_flag_tb = 0;
51         data_in_1_tb = 136'h0123456789abcdef0123456789abcdef;
52         data_in_2_tb = 136'hfedcba9876543210fedcba9876543210;
53         data_in_3_tb = 136'h0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f;
54         #200;
55         $display("Test case 1: mux_flag = 0, data_out = %h", data_out_tb);
56     end
57
58     // Test case 2: mux_flag = 1
59     initial begin
60         mux_flag_tb = 1;
61         #200;
62         $display("Test case 2: mux_flag = 1, data_out = %h", data_out_tb);
63     end
64
65 endmodule

```

## Behavioral:



## Post-Route:



## ● FFT

Verilog code:

```
1  <module> fft(
2
3      input  wire          clk,           // clock
4      input  wire          rst_n,         // reset
5      input  wire [33:0]    data_in,       // input from
6      output [33:0]        data_out     // output to pin
7
8  );
9
10
11  wire      mux_flag, demux_flag, s_p_flag;
12  wire[2:0]  rotation;
13  wire[135:0] data_1, data_2, data_3, data_4;
14
15  <ctrl> ctrl10(
16      .clk(clk),                // input from t
17      .rst_n(rst_n),            // input from t
18      .s_p_flag_in(s_p_flag),   // input from s
19      .mux_flag(mux_flag),      // output to mu
20      .rotation(rotation),     // output to bu
21      .demux_flag(demux_flag)  // output to p_
22  );
23
24  <s_p> s_p0(
25      .clk(clk),                // input from t
26      .rst_n(rst_n),            // input from t
27      .data_in_1(data_in),      // input from t
28      .data_out_1(data_1),       // output to mu
29      .s_p_flag_out(s_p_flag)  // output to ct
);
```

```

31   mux mux0(
32     .mux_flag(mux_flag),           //
33     .clk(clk),
34     .rst_n(rst_n),
35     .data_in_2(data_1),          //
36     .data_in_1(data_2),          //
37     .data_in_3(data_4),
38     .data_out(data_3)           //
39 );
40
41 butterfly butterfly0(
42   .calc_in(data_3),             //
43   .rotation(rotation),         //
44   .calc_out(data_4)            //
45 );
46
47 reg1 reg10(
48   .clk(clk),                  //
49   .rst_n(rst_n),              //
50   .data_in_2(data_4),
51   .data_out_2(data_2),
52   .reg_datain_flag(demux_flag)
53 );
54
55 p_s p_s0(
56   .clk(clk),
57   .rst_n(rst_n),
58   .p_s_flag_in(demux_flag),
59   .data_in_3(data_4),
60   .data_out_3(data_out)
61 );
62
63 endmodule

```

## HDL:



## Testbench:

```
1  `timescale 1ns / 1ps          29      // Reset generation
2                                         30      initial begin
3  module fft_tb;                  31          rst_n = 0;
4                                         32          #10;
5      // Inputs                  33          rst_n = 1;
6      reg clk;                   34          #10;
7      reg rst_n;                35          // Add more delay or te
8      reg [33:0] data_in;        36          $finish;
9                                         37      end
10     // Outputs                 38
11     wire [33:0] data_out;      39
12                                         40
13     // Instantiate the Unit   41      // Stimulus generation
14     fft uut (                  42      initial begin
15         .clk(clk),             43          // Initialize Inputs
16         .rst_n(rst_n),          44          data_in = 32'h12345678;
17         .data_in(data_in),       45          // Apply stimulus
18         .data_out(data_out)      46          #20; // Wait for some t
19     );                         47          data_in = 32'h87654321;
20                                         48          // Add more stimulus as
21     // Clock generation       49          // End simulation
22     always begin               50          #100;
23         clk = 0;                51          $finish;
24         #5;                     52      end
25         clk = 1;                53
26         #5;                     54
27     end                         55      endmodule
```

## Behavioral:



## Post-Route:



## ● FFT Chip

Verilog code:

```
1  module fft_chip(
2      input clk,
3      input rst_n,
4      input [33:0] data_in,
5      output [33:0] data_out
6  );
7
8  wire clk_buf, rst_n_buf;
9  wire [33:0] data_in_buf, data_out_buf;
10
11 // Input buffers
12 IBUF clk_ibuf (.O(clk_buf), .I(clk));
13 IBUF rst_n_ibuf (.O(rst_n_buf), .I(rst_n));
14 genvar i;
15 generate
16     for (i = 0; i < 34; i = i + 1) begin
17         IBUF data_in_ibuf (.O(data_in_buf[i]), .I(data_in[i]));
18     end
19 endgenerate
20
21 // Output buffers
22 genvar j;
23 generate
24     for (j = 0; j < 34; j = j + 1) begin
25         OBUF data_outobuf (.O(data_out[j]), .I(data_out_buf[j]));
26     end
27 endgenerate
28
29 fft inst_fft (
30     .clk(clk_buf),
31     .rst_n(rst_n_buf),
32     .data_in(data_in_buf),
33     .data_out(data_out_buf)
34 );
35
36
37
38 endmodule
```

HDL:



Testbench:

```
25  module fft_chip_tb;
26
27      // Inputs
28      reg clk;
29      reg rst_n;
30      reg [33:0] InputData [0:
31      reg [5:0] i;
32      reg [33:0] data_in;
33
34      // Outputs
35      wire [33:0] data_out;
36
37      // Instantiate the Unit
38      fft_chip uut (
39          .clk(clk),
40          .rst_n(rst_n),
41          .data_in(data_in),
42          .data_out(data_out)
43      );
44
45      initial begin
46          clk = 1'b0;
47          forever #4 clk = ~clk;
48      end
```

```

50 	`initial begin
51 	rst_n = 0;
52 	$display("\nLoad Data\n");
53 	$readmemb("/home/ise/NTUST_Advanced_Computer_Algorithms/fftchip/Data_Input.txt", InputData);
54
55 	#8 begin
56 	rst_n = 1'b1;
57 	for (i = 0; i < 47; i = i + 1) begin
58 	data_in = InputData[i];
59 	#8
60 	$display("[%d] %b_%b_%b %b_%b_%b\n", (i+13)%16, data_out[33], data_out[32:25], data_out[24:17],
61 	data_out[16], data_out[15:8], data_out[7:0]);
62 	end
63 end
64 end
65
66 endmodule

```

## Behavioral:



## Post-Route:



## 4. Discussion:

1.本次作業的主要目的是探討如何調用IP核，我選擇的是FFT。雖然四個主題都非常有趣，但我認為FFT對未來的幫助最大，因此選擇了它作為我的題目。我覺得這是一個非常有意義的專案，使我能夠將上學期在FPGA課程中學到的知識應用到這個專案中。這次的專案不僅鞏固了我對FPGA的基礎知識，還讓我有機會實際操作和應用先進的演算法。

2.最主要的問題是理解算法。由於這個算法是radix-4 FFT，是一種快速DFT的方法之一，而課堂上沒有學過這種改良的方法，所以我需要花一些時間來學習和理解它。這個過程涉及到對數學理論的深入研究，以及對演算法步驟的詳細理解。在理解這個改良過的算法時，我查閱了許多資料，並且觀看了一些相關的教學視頻，這讓我對FFT有了更全面的認識。

3.其次是在模組調用方面，需要理解這個晶片的設計原理以及如何進行模組間的連接，才能精確地使用這個算法。在這個過程中，我學習了如何分析和理解不同模組的功能和作用，並且掌握了如何通過正確的接口將它們連接起來。這不僅增強了我對硬體設計的理解，也提高了我在實際應用中解決問題的能力。整個過程充滿挑戰，但也非常有成就感，因為我能夠看到自己從理論學習到實踐操作的整個過程。

## 5. Conclusion

- Performance

1. Estimation

FFT operation

$$\frac{1s}{16 \times 7.4 \text{ ns}} = 8.445946 \times 10^6$$

Average Bandwidth

$$135 \text{ MHz} \times \frac{34 \text{ bits}}{9} = 510.51 \text{ MB/S}$$

Pipeline throughput

$$\frac{1s}{16 \times 7.4 \text{ ns}} \times 16 \times (17 + 17) = 4.5946 \times 10^9 \text{ bps}$$



- Power dissipation

### Power dissipation

| Item                     | Voltage/<br>Power dissipation |
|--------------------------|-------------------------------|
| Global Operating Voltage | 1.62 V                        |
| Cell Internal Power      | 756.0848 mW                   |
| Net Switching Power      | 375.6310 mW                   |
| Total Dynamic Power      | 1.1317 W                      |
| Cell Leakage Power       | 28.5281 uW                    |



## 2. Area result

| Item       | Number | Item              | Area( $\mu\text{m}^2$ ) |
|------------|--------|-------------------|-------------------------|
| Ports      | 70     | Combinational     | 1483546.414358          |
| Nets       | 140    | Non-combinational | 153599.850292           |
| Cells      | 71     | Net Interconnect  | 2089790.568665          |
| References | 3      | Total Area        | 3726936.83315           |

## 3. Evaluation index

| Parameter                                                | Value                                          |
|----------------------------------------------------------|------------------------------------------------|
| Process                                                  | SMIC 0.18 $\mu\text{m}$                        |
| Number of Pins                                           | 90                                             |
| Operating frequency                                      | 135.135 MHz                                    |
| Operating voltage                                        | 1.62 V                                         |
| Total area                                               | 4.695300 $\text{mm}^2$                         |
| Power consumption<br>per calculate operation             | $5.25118208 \times 10^6 \text{ mJ}$            |
| FFT operands per unit area and<br>unit power consumption | 41987.688<br>times/ $(\text{mm}^2 \text{mWs})$ |
| Bandwidth                                                | 510.51 MB/s                                    |
| Parallel throughput                                      | $4.5946 \times 10^9 \text{ bps}$               |

## 5.Reference

- [1]Design and Implementation of an FFT IP Generator
- [2] Garrido M, Huang S J, Chen S G. Feedforward FFT Hardware Architectures Based on Rotator Allocation[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2018, 65(2):581-592.
- [3] Zulkifli, Siva. Design of 16-point Radix-4 Fast Fourier Transform in 0.18 $\mu$ m CMOS Technology. American Journal of Applied Sciences. 2007.
- [4] 丁晓磊, 朱恩, 赵梅. 16点基4-FFT芯片设计技术研究[J]. 信息技术, 2007(01): 64-67+71.
- [5] DJJ 工程數學特論 講義
- [6]DJJ ADSP 講義