

# FFT 설계 & 검증 Project

대한상공회의소 서울기술교육센터  
AI 시스템 반도체 설계(2기)

1조 고종완 권희식 장찬원 서윤철



## CONTENT

01

**Floating Point Model  
& Fixed Point Model**

02

**Module Simulation  
& Verification**

03

**CBFP Simulation  
& Verification**

04

**Synthesis**

05

**Trouble Shooting**

06

**고찰**

# FFT (Fast Fourier Transform)

- 고속 푸리에 변환(FFT, Fast Fourier Transform)은 이산 푸리에 변환(DFT, Discrete Fourier Transform)을 효율적으로 계산하기 위한 알고리즘



# PART 1. Floating Point Model & Fixed Point Model

# FFT (Fast Fourier Transform)

## Floating-point FFT

- 목적: 알고리즘 검증, 기능 구현

- 특징:

- 정밀도(Precision)가 매우 높음.
- 오버플로(Overflow), 언더플로(Underflow) 문제 적음.
- 소프트웨어 기반 시뮬레이션(MATLAB, Python)에서 정확한 기준값 제공.

- 용도:

- 알고리즘의 동작을 확인.
- Fixed-Point 설계 전에 레퍼런스 모델로 사용.

## Fixed-point FFT

- 목적: 하드웨어 구현

- 특징:

- 정수 + 소수 부분을 비트 폭 제한으로 표현.
- 오버플로, 스케일링(Shift) 관리 필요.
- FPGA, ASIC 설계에서 자원(LUT, DSP) 최적화 가능.

- 용도:

- RTL 설계, FPGA/ASIC 구현.
- 연산 복잡도를 줄여 고속 연산 가능.

# FFT (Fast Fourier Transform)

MATLAB

## FFT Block Diagram



# FFT (Fast Fourier Transform)

MATLAB

## FFT BLOCK DIAGRAM



# FFT (Fast Fourier Transform)

MATLAB

```
fac8_0 = [1, 1, 1, -j];  
fac8_1 = [256, 256, 256, -j*256, 256, 181-j*181, 256, -181-j*181];
```



# FFT (Fast Fourier Transform)

MATLAB

Fixed Point FFT



# FFT (Fast Fourier Transform)

Hardware Block Diagram



# FFT (Fast Fourier Transform)

## FFT BLOCK DIAGRAM



# SQNR(Signal-to-Quantization Noise Ratio)

양자화 오류의 정도를 나타내는 지표

SQNR이 높을수록 양자화 잡음이 적고, 원본 신호와 양자화된 신호 유사도 증가

## SQNR과 양자화 잡음

양자화 잡음은 아날로그 신호를 디지털로 변환할 때 발생하는 오차

신호의 세부 사항이 소실되거나 오차가 발생하는 과정에서 발생

## SQNR과 CBFP

CBFP는 고정소수점 연산에서 발생하는 양자화 오류를 줄이기 위해 사용

CBFP는 동적 비트폭 조정을 통해 양자화 오류를 줄이고, SQNR을 향상

## PART 2. Module Simulation & Verification

# Module0 – step0\_0



# Module0 – step0\_1



# Module0 – step0\_2

```
# step0_2
# VeriQ valid_bfly01_out
# VeriQ bfly02_tmp_i[511:0][12:0]
# VeriQ bfly02_tmp_q[511:0][12:0]
# VeriQ valid_bfly02_tmp_out
# VeriQ sat_out_i[511:0][12:0]
# VeriQ sat_out_q[511:0][12:0]
# VeriQ valid_saturation_out
# VeriQ pre_bfly02_i[511:0][23:0]
# VeriQ pre_bfly02_q[511:0][23:0]
# VeriQ valid_mul_twf_m0_out
# VeriQ pre_bfly02_real_buf[0:511][22:0]
# VeriQ pre_bfly02_imag_buf[0:511][22:0]
# VeriQ bfly02_i[511:0][10:0]
# VeriQ bfly02_q[511:0][10:0]
# VeriQ valid_output
```



```
# step0_2
# VeriQ valid_bfly01_out
# VeriQ bfly02_tmp_i[511:0][12:0]
# VeriQ bfly02_tmp_q[511:0][12:0]
# VeriQ valid_bfly02_tmp_out
# VeriQ sat_out_i[511:0][12:0]
# VeriQ sat_out_q[511:0][12:0]
# VeriQ valid_saturation_out
# VeriQ pre_bfly02_i[511:0][23:0]
# VeriQ pre_bfly02_q[511:0][23:0]
# VeriQ valid_mul_twf_m0_out
# VeriQ pre_bfly02_real_buf[0:511][22:0]
# VeriQ pre_bfly02_imag_buf[0:511][22:0]
# VeriQ bfly02_i[511:0][10:0]
# VeriQ bfly02_q[511:0][10:0]
# VeriQ valid_output
```



# Module1 – step1\_0



# Module1 – step1\_1



# Module1 – step1\_2

```
# step1_2
Ver A valid_bfly12_out
Ver A bfly12_tmp_i[0:511][14:0]
Ver A bfly12_tmp_q[0:511][14:0]
Ver B valid_bfly12_tmp_out
Ver A pre_bfly12_i[0:511][24:0]
Ver A pre_bfly12_q[0:511][24:0]
Ver B valid_mul_twf_ml_out
Ver A pre_bfly12_real_buf[0:511][24:0]
Ver A pre_bfly12_imag_buf[0:511][24:0]
Ver D bfly12_i[0:511][11:0]
Ver D bfly12_q[0:511][11:0]
Ver B valid_output
```



```
# step1_2
Ver A valid_bfly12_out
Ver A bfly12_tmp_i[0:511][14:0]
Ver A bfly12_tmp_q[0:511][14:0]
Ver B step1_1
Ver A STRING FORMAT ERROR AT POSITION 1
Ver A pre_bfly12_q[0:511][24:0]
Ver B valid_mul_twf_ml_out
Ver A pre_bfly12_real_buf[0:511][24:0]
Ver A pre_bfly12_imag_buf[0:511][24:0]
Ver B valid_start_out
Ver D bfly12_i[0:511][11:0]
Ver D bfly12_q[0:511][11:0]
Ver B valid_output
```



# Module2 – step2\_0



# Module2 – step2\_1



# Module2 – step2\_2



# PART 3. CBFP Simulation & Verification

# CBFP



목적: 고정소수점 연산에서 공동 스케일링을 적용해 동적 범위 확장 및 오버플로 방지.

방식: 블록 단위로 입력 데이터의 최소 Headroom 탐색 → Shift 스케일링 적용

장점: 정밀도 유지 + 하드웨어 효율성 확보, FFT 등 대규모 연산에서 메모리와 연산자 절약

# CBFP



8개 블록  
블록당 data 64개

64개 블록  
블록당 data 8개

# CBFP

MSB == 1

```
if ((d >= 1) && (d <= 1)) {
    case 1:
        23'b11111111111111111111111111111111: headroom <= 22;
        23'b11111111111111111111111111111111?: headroom <= 21;
        23'b11111111111111111111111111111111?: headroom <= 20;
        23'b11111111111111111111111111111111???: headroom <= 19;
        23'b11111111111111111111111111111111?????: headroom <= 18;
        23'b11111111111111111111111111111111?????: headroom <= 17;
        23'b11111111111111111111111111111111?????: headroom <= 16;
        23'b11111111111111111111111111111111?????: headroom <= 15;
        23'b11111111111111111111111111111111?????: headroom <= 14;
        23'b11111111111111111111111111111111?????: headroom <= 13;
        23'b11111111111111111111111111111111?????: headroom <= 12;
        23'b11111111111111111111111111111111?????: headroom <= 11;
        23'b11111111111111111111111111111111?????: headroom <= 10;
        23'b11111111111111111111111111111111?????: headroom <= 9;
        23'b11111111111111111111111111111111?????: headroom <= 8;
        23'b11111111111111111111111111111111?????: headroom <= 7;
        23'b11111111111111111111111111111111?????: headroom <= 6;
        23'b11111111111111111111111111111111?????: headroom <= 5;
        23'b11111111111111111111111111111111?????: headroom <= 4;
        23'b11111111111111111111111111111111?????: headroom <= 3;
        23'b11111111111111111111111111111111?????: headroom <= 2;
        23'b11111111111111111111111111111111?????: headroom <= 1;
        23'b11111111111111111111111111111111?????: headroom <= 0;
    endcase
} else
    over <= 1;
}
```

MSB == 0

## 구현내용



| 과정         | 설명              | 구현 방법               |
|------------|-----------------|---------------------|
| mag_detect | headroom을 찾는 단계 | MUX(case)           |
| min_detect | 최솟값을 찾는 단계      | 조합 논리               |
| shift      | 스케일링 단계         | shift 연산자(<<<, >>>) |

# CBFP

BLOCK



12bit

| Decimal | Binary         | headroom |
|---------|----------------|----------|
| -20     | 1111 1110 1100 | 6        |
| -10     | 1111 1111 0110 | 7        |
| 0       | 0000 0000 0000 | 11       |
| 10      | 0000 0000 1010 | 7        |
| 20      | 0000 0001 0100 | 6        |
| 30      | 0000 0001 1110 | 6        |
| 40      | 0000 0010 1000 | 5        |
| 50      | 0000 0011 0010 | 5        |

7bit

CBFP  
→

| Binary    |
|-----------|
| 1 10 1100 |
| 1 11 0110 |
| 0 00 0000 |
| 0 00 1010 |
| 0 01 0100 |
| 0 01 1110 |
| 0 10 1000 |
| 0 11 0010 |

CBFP02

23bit를 11bit로 스케일링 하기

CBFP12

25bit를 12bit로 스케일링 하기



512 / FSM, CNT

V1



16 / CNT

V2

# 512 / FSM, CNT → Input: cosine(pre\_bfly02)

16개씩 처리



Latency: 68clk

# 16 / CNT → Input: cosine(pre\_bfly02)



# 16 / CNT → Input: random(pre\_bfly02)



# CBFP02 AREA

```

Report : timing
  -path Full
  -dclock clk
  -max_paths 1
Design : CBF_P_02
Version: V-2021.12-SP5-4
Date : Mon Aug 4 17:54:35 2025
*****  

# A fanout number of 1000 was used for high fanout net computations.  

*****  

Operating Conditions: TT_0P0BV_0P0BV_0P0BV_0P0BV_25C Library: GF22FDX_SCP75T  

Wire Load Model: enclosed  

*****  

Startpoint: GEN_HEADROOM_0_u_hd_im/headroom_reg_3
  (rising edge-triggered flip-flop clocked by cnt_clk)
Endpoint: temp_im_reg_1_2
  (rising edge-triggered flip-flop clocked by cnt_clk)
Path Group: cnt_clk
Path Type: max  

*****  

Point           Incr      Path
-----  

clock cnt_clk (rise edge)        0.00      0.00
clock network delay (ideal)    0.00      0.00
GEN_HEADROOM_0_u_hd_im/headroom_reg_3/_CLK (SCP75T_SDFRQX1_A_CSC2BL) 0.00      0.00 r
GEN_HEADROOM_0_u_hd_im/headroom_reg_3/_Q (SCP75T_SDFRQX1_B_CSC2BL) 0.00      0.00 r
                                         43.03    43.03 f
GEN_HEADROOM_0_u_hd_im/headroom[3] (headroom_detect_23_36) 0.00      43.03 f
                                         28.95    21.19 f
U411297/Z (SCP75T_NR21AX2_CSC2BL) 21.41    64.44 f
U258974/Y (SCP75T_INVX1_CSC2BL) 8.38     72.82 r
U258973/Z (SCP75T_OA1Z1X1_CSC2BL) 11.15    83.97 f
U247056/Z (SCP75T_AO1Z1X1_CSC2BL) 26.15    110.12 r
U254118/Z (SCP75T_NR2X3_CSC2BL) 22.74    132.85 f
U411327/Z (SCP75T_MUX1X21_CSC2BL) 34.19    167.85 f
U411330/Z (SCP75T_OA1Z1X2_CSC2BL) 24.18    191.23 r
U259344/Z (SCP75T_OA1Z1X2_CSC2BL) 30.00    212.19 f
                                         28.95    212.19 f
U411331/Z (SCP75T_MUX1X21_CSC2BL) 33.33    245.51 f
U411337/Z (SCP75T_OA1Z1X2_CSC2BL) 25.38    270.89 r
U253063/Z (SCP75T_OA1Z1X2_CSC2BL) 27.72    298.61 f
U411338/Z (SCP75T_NR2X2_MR_CSC2BL) 15.19    313.88 r
U247584/Z (SCP75T_OA1Z1X2_CSC2BL) 14.06    327.86 f
U411341/Z (SCP75T_AO1Z1X44_CSC2BL) 24.58    352.44 r
U411346/Z (SCP75T_NR2X2_MR_CSC2BL) 16.09    364.53 f
U411347/Z (SCP75T_OA1Z1X2_CSC2BL) 14.75    378.22 f
U411353/Z (SCP75T_OA1Z1X2_CSC2BL) 19.49    397.57 f
U411355/Z (SCP75T_OA1Z1X44_CSC2BL) 21.04    419.40 f
U249801/Z (SCP75T_INVX2_CSC2BL) 11.11    450.51 f
U249802/Z (SCP75T_INVX2_CSC2BL) 7.62     458.33 r
U259455/Z (SCP75T_OA1Z1X2_CSC2BL) 12.43    459.56 f
U254104/Z (SCP75T_OA1Z2X1_CSC2BL) 15.57    466.13 r
U251940/Z (SCP75T_OA1Z1X3_CSC2BL) 20.20    486.33 f
U247950/Z (SCP75T_INVX3_CSC2BL) 14.11    500.44 r
U251939/Z (SCP75T_NR2X3_CSC2BL) 8.98     509.42 f
U411361/Z (SCP75T_OA1Z1X3_CSC2BL) 11.94    521.37 r
U259458/Z (SCP75T_ND2X2_CSC2BL) 10.53    531.89 f
U259440/Z (SCP75T_OA1Z1X2_CSC2BL) 18.93    550.82 r
U411368/Z (SCP75T_MUX1X21_CSC2BL) 29.91    580.73 r
U411372/Z (SCP75T_OA1Z2X1_CSC2BL) 23.48    604.21 f
U411373/Z (SCP75T_OA1Z2X2_CSC2BL) 25.59    629.88 r
U411374/Z (SCP75T_INVX2_CSC2BL) 12.81    642.61 f
U259550/Z (SCP75T_OA1Z1X1_CSC2BL) 17.79    668.40 r
U254098/Z (SCP75T_OA1Z1X2_CSC2BL) 19.43    679.83 f
U411992/Z (SCP75T_MUX1X21_CSC2BL) 33.80    713.64 f
U411997/Z (SCP75T_OA1Z1X1_CSC2BL) 27.50    741.14 r
U411998/Z (SCP75T_NR2X2_MR_CSC2BL) 19.51    766.64 f
U412080/Z (SCP75T_MUX1X21_CSC2BL) 32.99    793.64 f
U412085/Z (SCP75T_OA1Z1X1_CSC2BL) 23.75    817.39 r
U259552/Z (SCP75T_ND2X2_MR_CSC2BL) 17.35    834.73 f
U259481/Z (SCP75T_MUX1X21_A_CSC2BL) 38.29    873.02 f
U259482/Z (SCP75T_INVX2_CSC2BL) 9.28     882.22 f
U250288/Z (SCP75T_OA1X3X_CSC2BL) 27.59    899.81 f
U259432/Z (SCP75T_MUX1X21_CSC2BL) 27.01    936.82 f
U259695/Z (SCP75T_OA1Z1X1_MR_CSC2BL) 23.78    960.68 r
U412017/Z (SCP75T_OA1Z1X2_CSC2BL) 22.38    982.98 f
U254106/Z (SCP75T_MUX1X22_CSC2BL) 40.10    1023.88 r
U254105/Z (SCP75T_INVX3_CSC2BL) 9.95     1033.03 f
U412024/Z (SCP75T_OA1Z1X3_CSC2BL) 17.49    1050.52 r
U412025/Z (SCP75T_OA1Z1X4_CSC2BL) 17.28    1067.80 f
U412026/Z (SCP75T_MUX1X21_CSC2BL) 32.49    1080.29 f
U412031/Z (SCP75T_OA1Z1X1_CSC2BL) 21.70    1121.99 r
U259456/Z (SCP75T_OA1X3X_CSC2BL) 38.12    1152.11 r
U251913/Z (SCP75T_INVX3_CSC2BL) 10.47    1162.58 f
U247490/Z (SCP75T_MUX1X22_CSC2BL) 30.78    1193.35 f
U254095/Z (SCP75T_OA1Z1X2_CSC2BL) 24.47    1217.82 f
U263076/Z (SCP75T_OA1X3X_CSC2BL) 24.70    1242.53 f
U251909/Z (SCP75T_INVX4_CSC2BL) 21.43    1263.96 f
U258969/Z (SCP75T_ND2X4_CSC2BL) 14.09    1278.04 f
U412042/Z (SCP75T_OA1Z1X6_CSC2BL) 22.02    1300.86 r
temp_im_reg_4_2_0/(SCP75T_SDFRQX1_S_CSC2BL) 0.00      1300.86 f
data arrival time 1300.86  

*****  

clock cnt_clk (rise edge)          1400.00    1400.00
clock network delay (ideal)       0.00      1400.00
clock uncertainty                -50.00    1350.00
temp_im_reg_1_2/_CLK/(SCP75T_SDFRQX1_S_CSC2BL) 0.00      1350.00 r
library setup time               -49.82    1300.18
data required time                1300.00  

*****  

data required time                1300.18
data arrival time                 -1300.06  

*****  

slack (NET)                      0.12

```

```
GF22FDX_SC7P5T_116CPP_BASE_CSC20L_TT_0P80V_0P00V_0P00V_0P00V  
PP_BASE_CSC20L_TT_0P80V_0P00V_0P00V_0P00V_0P00V_25C.db)

Number of ports: 41236
Number of nets: 241535
Number of cells: 216783
Number of combinational cells: 204709
Number of sequential cells: 11974
Number of macros/black boxes: 0
Number of buf/inv: 48747
Number of references: 270

Combinational area: 64211.568428
Buf/Inv area: 8959.956063
Noncombinational area: 19120.442858
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (No wire load specified)

Total cell area: 83332.011285
Total area: undefined
```

# 512 / FSM, CNT

[GF22FDX\_SC7P5T\_116CPP\_BASE\_CSC20L\_TT\_0P80V\_0P00V\_0P00V\_0P00V\_PP\_BASE\_CSC20L\_TT\_0P80V\_0P00V\_0P00V\_0P00V\_25C.db)

|                                |                                    |
|--------------------------------|------------------------------------|
| Number of ports:               | 2272                               |
| Number of nets:                | 15898                              |
| Number of cells:               | 14085                              |
| Number of combinational cells: | 9013                               |
| Number of sequential cells:    | 5002                               |
| Number of macros/black boxes:  | 0                                  |
| Number of buf/inv:             | 2296                               |
| Number of references:          | 182                                |
| Combinational area:            | 3681.561600                        |
| Buf/Inv area:                  | 938.625587                         |
| Noncombinational area:         | 9503.879788                        |
| Macro/Black Box area:          | 0.000000                           |
| Net Interconnect area:         | undefined (No wire load specified) |
| Total cell area:               | 13185.441388                       |
| Total area:                    | undefined                          |

6 / CNT

Information: Updating design information... (UID=85)  
 Warning: Design "CBFP\_02" contains 3 high-fanout nets. A fanout number of 1000  
 \*\*\*\*\*  
 Report : timing  
 -path full  
 -delay max  
 -max\_paths 1  
 Design : CBFP\_02  
 Version: V-2023.12-SPS-4  
 Date : Mon Aug 4 15:45:24 2025  
 \*\*\*\*\*

# A fanout number of 1000 was used for high fanout net computations.

Operating Conditions: TT\_PP0V0\_PP0V0\_PP0V0\_PP0V0\_25C Library: GF22FOX\_SC7PS1  
 Sire Load Model Mode: enclosed

Startpoint: GEN\_HEADROOM\_1\_u\_hd\_im/headroom\_reg\_4\_  
 (rising edge-triggered flip-flop clocked by cnt\_clk)  
 Endpoint: group\_min\_reg\_0\_0  
 (rising edge-triggered flip-flop clocked by cnt\_clk)  
 Path Group: cnt\_clk  
 Path type: max

| Point                                                               | Incr    | Path      |
|---------------------------------------------------------------------|---------|-----------|
| clock cnt_clk (rise edge)                                           | 0.00    | 0.00      |
| clock network delay (ideal)                                         | 0.00    | 0.00      |
| GEN_HEADROOM_1_u_hd_im/headroom_reg_4 / CLK (SC7PS1_SDFFQX4_CSC2BL) | 0.00    | 0.00 r    |
| GEN_HEADROOM_1_u_hd_im/headroom_reg_4/Q (SC7PS1_SDFFQX4_CSC2BL)     | 0.00    | 55.90 f   |
| GEN_HEADROOM_1_u_hd_im/headroom[4] (headroom_detect_23_28)          | 0.00    | 55.90 f   |
| U6388/Z (SC7PS1_ND2IA2_XCSC2BL)                                     | 21.55   | 77.46 f   |
| S6381/Z (SC7PS1_ND2X2_XCSC2BL)                                      | 11.58   | 89.83 r   |
| U7085/Z (SC7PS1_OA1X2_XCSC2BL)                                      | 14.23   | 103.26 f  |
| E6471/Z (SC7PS1_NP2X2_XCSC2BL)                                      | 16.37   | 119.63 r  |
| U6781/Z (SC7PS1_UF4X4_XCSC2BL)                                      | 16.73   | 136.36 r  |
| U6677/Z (SC7PS1_NP2X2_XCSC2BL)                                      | 13.98   | 139.25 f  |
| E6677/Z (SC7PS1_NP2X2_XCSC2BL)                                      | 24.38   | 170.0 f   |
| S6276/Z (SC7PS1_NR2X2_XCSC2BL)                                      | 12.62   | 183.25 r  |
| S6381/Z (SC7PS1_OA1X2_XCSC2BL)                                      | 15.91   | 199.16 f  |
| U6189/Z (SC7PS1_NP2X2_MR_CSC2BL)                                    | 12.34   | 211.50 r  |
| U6278/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 10.74   | 222.24 f  |
| U6379/Z (SC7PS1_OA1X2_XCSC2BL)                                      | 18.28   | 240.52 r  |
| U6088/Z (SC7PS1_NP2X1_MR_CSC2BL)                                    | 16.72   | 257.24 f  |
| U6378/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 10.64   | 267.87 r  |
| U6394/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 10.00   | 270.0 r   |
| S6391/Z (SC7PS1_AP0122X4_XCSC2BL)                                   | 15.19   | 275.88 r  |
| S6587/Z (SC7PS1_NP2X2_MR_CSC2BL)                                    | 20.87   | 317.94 r  |
| U6258/Z (SC7PS1_AP0221A1X2_XCSC2BL)                                 | 16.98   | 334.85 r  |
| U6798/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 12.14   | 346.98 r  |
| U6778/Z (SC7PS1_AP0121X1_MR_CSC2BL)                                 | 21.68   | 368.66 r  |
| U6749/Z (SC7PS1_OA1X2_XCSC2BL)                                      | 17.74   | 386.48 f  |
| S6183/Z (SC7PS1_AP0121X3_XCSC2BL)                                   | 21.42   | 407.82 r  |
| U6123/Z (SC7PS1_NP2X2_MR_CSC2BL)                                    | 18.88   | 418.70 f  |
| U6352/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 16.37   | 437.0 r   |
| U6142/Z (SC7PS1_NP2X2_XCSC2BL)                                      | 21.01   | 455.58 r  |
| U6252/Z (SC7PS1_NP2X2_MR_CSC2BL)                                    | 8.37    | 467.85 f  |
| U6258/Z (SC7PS1_AP0221A1X2_XCSC2BL)                                 | 13.62   | 488.67 r  |
| U6122/Z (SC7PS1_NP2X2_XCSC2BL)                                      | 24.10   | 504.76 r  |
| U6478/Z (SC7PS1_OA1X2_XCSC2BL)                                      | 22.10   | 526.86 f  |
| A6469/Z (SC7PS1_ND2X2_XCSC2BL)                                      | 13.22   | 540.08 r  |
| U6795/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 17.53   | 557.60 f  |
| U6121/Z (SC7PS1_AP0122X4_XCSC2BL)                                   | 29.11   | 586.72 f  |
| U6351/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 21.07   | 608.0 r   |
| U6130/Z (SC7PS1_NP2X1_MR_CSC2BL)                                    | 20.83   | 637.63 r  |
| H6792/Z (SC7PS1_AP0122X1_XCSC2BL)                                   | 23.56   | 661.18 r  |
| S6110/Z (SC7PS1_OA1X2_XCSC2BL)                                      | 18.91   | 688.09 r  |
| U6693/Z (SC7PS1_AP0122X2_XCSC2BL)                                   | 12.58   | 692.67 f  |
| U6692/Z (SC7PS1_NP2X1_MR_CSC2BL)                                    | 15.23   | 707.98 r  |
| U7123/Z (SC7PS1_OA1X2_XCSC2BL)                                      | 20.01   | 728.81 f  |
| S6332/Z (SC7PS1_NP2X2_XCSC2BL)                                      | 19.53   | 748.34 r  |
| U6352/Z (SC7PS1_ND2X2_XCSC2BL)                                      | 12.81   | 760.35 f  |
| U6353/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 10.20   | 770.0 r   |
| U6168/Z (SC7PS1_AP0122X1_XCSC2BL)                                   | 23.68   | 791.23 r  |
| J6713/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 17.69   | 808.91 r  |
| S6119/Z (SC7PS1_NP2X2_XCSC2BL)                                      | 17.00   | 825.90 f  |
| G6491/Z (SC7PS1_NP2X2_XCSC2BL)                                      | 38.29   | 864.19 r  |
| U6492/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 9.66    | 873.85 f  |
| U7149/Z (SC7PS1_AP0121X1_XCSC2BL)                                   | 20.07   | 893.92 r  |
| U6495/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 25.78   | 919.71 r  |
| A6496/Z (SC7PS1_AP0122X4_XCSC2BL)                                   | 18.55   | 938.25 f  |
| U6671/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 20.94   | 960.0 r   |
| S6612/Z (SC7PS1_AP0221A1X2_XCSC2BL)                                 | 25.34   | 985.49 r  |
| N6119/Z (SC7PS1_NP2X1_XCSC2BL)                                      | 28.53   | 1014.03 r |
| S6532/Z (SC7PS1_NP2X3_XCSC2BL)                                      | 19.59   | 1033.62 r |
| G6227/Z (SC7PS1_ND2IA2_XCSC2BL)                                     | 11.41   | 1045.02 f |
| G6226/Z (SC7PS1_OA1X2_XCSC2BL)                                      | 9.94    | 1054.96 r |
| G6222/Z (SC7PS1_ND2X2_XCSC2BL)                                      | 11.57   | 1066.53 f |
| I6115/Z (SC7PS1_NP2X2_XCSC2BL)                                      | 11.98   | 1078.43 r |
| G6221/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 15.38   | 1094.36 f |
| T6168/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 11.17   | 1104.8 r  |
| S6367/Z (SC7PS1_AP0121X2_XCSC2BL)                                   | 15.50   | 1124.83 r |
| H6168/Z (SC7PS1_AP0221A1X2_XCSC2BL)                                 | 17.72   | 1141.75 r |
| S6583/Z (SC7PS1_AP0122X1_XCSC2BL)                                   | 21.18   | 1162.93 f |
| group_min_reg_0_d_0 / D (SC7PS1_SDFFQX4_CSC2BL)                     | 0.00    | 1189.94 r |
| data arrival time                                                   |         | 1199.94   |
| clock cnt_clk (rise edge)                                           | 1400.00 | 1400.00   |
| clock network delay (ideal)                                         | 0.00    | 1400.00   |
| clock uncertainty                                                   | -50.00  | 1350.00   |
| group_min_reg_0_d_0 / CLK (SC7PS1_SDFFQX4_CSC2BL)                   | 0.00    | 1350.00 r |
| setup setup time                                                    | -49.82  | 1300.18   |
| data required time                                                  |         | 1300.18   |
| data required time                                                  |         | 1300.18   |
| data arrival time                                                   |         | 1300.18   |
| data arrival time                                                   |         | 1300.18   |
| data arrival time                                                   |         | 1300.18   |
| slack (MET)                                                         |         | 0.24      |

# CBFP 결과

CBFP02

bfly02 (cosine)



<Fixed Model>

<RTL>

CBFP12

bfly12



<Fixed Model>

<RTL>

## PART 4. Synthesis

# Setup Timing Check

- SQNR (Set up Timing 체크에서 slack 0.54ps로 타이밍 요구사항 만족(MET)

|                                                                       |         |          |
|-----------------------------------------------------------------------|---------|----------|
| clock cnt_clk (rise edge)                                             | 1400.00 | 1400.00  |
| clock network delay (ideal)                                           | 0.00    | 1400.00  |
| clock uncertainty                                                     | -50.00  | 1350.00  |
| u1_module0/u1_cbfp02/temp_im_reg_0__1_/CLK (SC7P5T_SDFFRQX1_S_CSC20L) | 0.00    | 1350.00  |
| library setup time                                                    | -53.39  | 1296.61  |
| data required time                                                    |         | 1296.61  |
| <hr/>                                                                 |         |          |
| data required time                                                    |         | 1296.61  |
| data arrival time                                                     |         | -1296.07 |
| <hr/>                                                                 |         |          |
| slack (MET)                                                           |         | 0.54     |

# Cell Count & Area Report

- 합성된 회로는 총 2,579,028,72의 셀 면적을 가지며, 약 264만 개의 셀로 구성

```
Library(s) Used:  
  
GF22FDX_SC7P5T_116CPP_BASE_CSC20L_TT_0P80V_0P00V_0P00V_0P00V_25C (Fil  
  
Number of ports: 1110800  
Number of nets: 3617970  
Number of cells: 2648318  
Number of combinational cells: 1575626  
Number of sequential cells: 1061152  
Number of macros/black boxes: 0  
Number of buf/inv: 342339  
Number of references: 6  
  
Combinational area: 656223.876991  
Buf/Inv area: 56956.743001  
Noncombinational area: 1922804.843500  
Macro/Black Box area: 0.000000  
Net Interconnect area: undefined (No wire load specified)  
  
Total cell area: 2579028.720491  
Total area: undefined
```

# Synthesis 현재 상황



Alveo U200 Data Center Accelerator Card

```
# Clock constraint
create_clock -name sys_clk -period 10 [get_ports clk]

# IOSTANDARD 설정 (PACKAGE_PIN은 지정 안 함)
set_property IOSTANDARD LVCMOS18 [get_ports clk]
```

• DRC (2 errors)  
• Pin Planning (2 errors)

- [DRC NSTD-1] Unspecified IO Standard: 1 out of 1 logical ports use IO standard (IOSTANDARD) value **DEFAULT**, instead of a user assigned specific value. This may cause I/O contention or incompatibility with the board power or connectivity affecting performance, signal integrity or in extreme cases cause damage to the device or the components to which it is connected. To correct this violation, specify all I/O standards. This design will fail to generate a bitstream unless all logical ports have a user specified I/O standard value defined. To allow bitstream creation with unspecified I/O standard values (not recommended), use this command: set\_property SEVERITY {Warning} [get\_drc\_checks NSTD-1]. NOTE: When using the Vivado Runs Infrastructure (e.g. launch\_runs Tol command), add this command to a .tcl file and add that file as a pre-hook for write\_bitstream step for the implementation run. Problem ports: [clk](#).
- [DRC UCIO-1] Unconstrained Logical Port: 1 out of 1 logical ports have no user assigned specific location constraint (LOC). This may cause I/O contention or incompatibility with the board power or connectivity affecting performance, signal integrity or in extreme cases cause damage to the device or the components to which it is connected. To correct this violation, specify all pin locations. This design will fail to generate a bitstream unless all logical ports have a user specified site LOC constraint defined. To allow bitstream creation with unspecified pin locations (not recommended), use this command: set\_property SEVERITY {Warning} [get\_drc\_checks UCIO-1]. NOTE: When using the Vivado Runs Infrastructure (e.g. launch\_runs Tol command), add this command to a .tcl file and add that file as a pre-hook for write\_bitstream step for the implementation run. Problem ports: [clk](#).

[Vivado 12-134] Error(s) found during DRC. Bitgen not run.

## Timing

|                              |             |
|------------------------------|-------------|
| Worst Negative Slack (WNS):  | -0.548 ns   |
| Total Negative Slack (TNS):  | -600.758 ns |
| Number of Failing Endpoints: | 3199        |
| Total Number of Endpoints:   | 1705469     |

## FFT Module0 간 비교



Operates all 512 Points all at once

Latency Increases at steps pass

# FFT Module0 간 비교

|            | V1(512)          | V2(16)            | %        |
|------------|------------------|-------------------|----------|
| Area       | 914317           | 114247            | 87.5% 감소 |
| LUT        | 338854           | 26673             | 92.1% 감소 |
| Registers  | 294883           | 37203             | 87.3% 감소 |
| Setup Time | 1.52 (Slack Met) | 28.66 (Slack Met) |          |
| Latency    | 17               | 54                | 68% 증가   |

## PART 5. Trouble Shooting

# Indexsum 계산 수정

## 문제점

- Saturation 처리 없음
- 비트 폭이 정확하게 처리되지 않음

수정 전 코드

```
always_ff @(posedge clk or negedge rstn) begin
    if (!rstn) begin
        for (int kk = 0; kk < 512; kk++) begin
            indexsum_re_reg[kk] <= '0;
            indexsum_im_reg[kk] <= '0;
        end
    end else if (valid_in) begin
        for (int kk = 0; kk < 512; kk++) begin
            indexsum_re_reg[kk] = index1_re[kk] + index2_re[kk];
            indexsum_im_reg[kk] = index1_im[kk] + index2_im[kk];
        end
    end
end
end
```

## 수정

- 조합 논리 변경
- Saturation 처리 추가

수정된 코드

```
always_comb begin
    for (int kk = 0; kk < 512; kk++) begin
        indexsum_re_reg[kk] = index1_re[kk] + index2_re[kk];
        indexsum_im_reg[kk] = index1_im[kk] + index2_im[kk];
        // Saturate if the result exceeds 6-bit range
        if (indexsum_re_reg[kk] > 6'h3F) begin
            indexsum_re_reg[kk] = 6'h3F;
        end
        if (indexsum_im_reg[kk] > 6'h3F) begin
            indexsum_im_reg[kk] = 6'h3F;
        end
    end
end
```

# Shift 연산 음수 값 처리

## 문제점

- Shift 연산에서 음수 값 처리 부분에서 문제 발생
- Bit shift 연산을 양수만 처리하는 알고리즘 설계

## 수정

- Shift 할 값이 양수일 경우 왼쪽 shift 연산을 하고, 음수일 경우 오른쪽 shift 연산을 수행하도록 수정

```
//  
shift_val_re[i] = 9 - indexsum_re[i];  
if (indexsum_re[i] >= 6'd23)  
    re_bfly22_pipe[i] <= '0;  
else if (shift_val_re[i] > 0)  
    re_bfly22_pipe[i] <= bfly22_tmp_re[i] <<< shift_val_re[i];  
else  
    re_bfly22_pipe[i] <= bfly22_tmp_re[i];
```



```
//  
shift_val_re[i] = 9 - indexsum_re[i];  
if (indexsum_re[i] >= 6'd23)  
    re_bfly22_pipe[i] <= '0;  
else if (shift_val_re[i] > 0)  
    re_bfly22_pipe[i] <= bfly22_tmp_re[i] <<< shift_val_re[i];  
else if (shift_val_re[i] < 0)  
    re_bfly22_pipe[i] <= bfly22_tmp_re[i] >>> -shift_val_re[i];  
else  
    re_bfly22_pipe[i] <= bfly22_tmp_re[i];
```

# Shift 연산 음수 값 처리

## 문제

- 상수로 저장할 값을 logic 타입으로 선언하여 합성중 최적화 과정에서 꼭 필요한 레지스터가 제거되는 현상 발생
- 이로 인해 합성이 모두 진행되었음에 도 불구하고, 합성 결과는 비어있게 됨.

```
logic signed [COEFF_WIDTH-1:0] coeffs [0:TAP_NUM-1] = {  
    16'sd0, -16'sd1, 16'sd1, 16'sd0, -16'sd1, 16'sd2, 16'sd0, -16'sd2,  
    16'sd2, 16'sd0, -16'sd6, 16'sd8, 16'sd10, -16'sd28, -16'sd14, 16'sd111,  
    16'sd196, 16'sd111, -16'sd14, -16'sd28, 16'sd10, 16'sd8, -16'sd6, 16'sd0,  
    16'sd2, -16'sd2, 16'sd0, 16'sd2, -16'sd1, 16'sd0, 16'sd1, -16'sd1,  
    16'sd0  
};
```

## 수정

- logic이 아닌 parameter로 변경하여 해결

```
Information: The register 'shift_reg_reg[27][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[27][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[26][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[25][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[24][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[24][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[23][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[23][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[22][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[22][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[21][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[21][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[20][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[20][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[19][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[19][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[18][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[18][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[17][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[17][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[16][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[16][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[15][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[15][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[14][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[14][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[13][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[13][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[12][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[12][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[11][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[11][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[10][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[10][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[9][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[9][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[8][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[8][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[7][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[7][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[6][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[5][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[4][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[4][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[3][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[2][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[1][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[0][6]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[32][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[32][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[31][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[31][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[30][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[30][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[29][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[29][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[28][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[28][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[27][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[27][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[26][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[26][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[25][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[24][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[24][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[23][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[23][5]' will be removed. (OPT-1207)  
Information: The register 'shift_reg_reg[22][5]' will be removed. (OPT-1207)
```

## PART 6. 고찰

# 고찰

- 비트 폭 불일치 문제

모듈 인스턴스 및 시뮬레이션 과정에서 신호 비트 폭 차이로 디버깅 어려움 발생

-> 초기 데이터 폭 정의의 중요성 인식

- 설계 스타일 차이, 팀원 간 코딩 스타일 불일치로 통합 과정에서 추가 조율 필요

-> 코딩 컨벤션 확립 필요성 확인

- 요구사항 분석 보완

초기 스펙 해석 부족으로 설계 방향 설정 지연 -> 요구사항 정리 및 문서화 강화 필요

**감사합니다.**