

# **Design and Implementation of 64-point Fast Fourier Transform Chip for OFDM-based 802.11a WLAN in 45nm Lithography**

**December 2019**

**Bagus Hanindhito  
Reshma Rajarama Nayak  
Shvetha Senthil Kumar**

---

Graduate Students, Department of Electrical and Computer Engineering,  
The University of Texas at Austin, Austin, USA

# Introduction

# Outline

- Introduction
- Mathematical Explanation
- Circuit Architecture
- Functional Simulation
- HDL Synthesis
- Automatic Place and Route
- Conclusion
- References

# Background

- IEEE 802.11a standard requires that each 64-point FFT/IFFT operation should be completed within **4μs** for the signal transmitted at 20 MHz bandwidth.
- Typical Cooley–Tukey radix-2 64-point FFT computation requires 192 butterfly complex operations.
  - Consumes memory to store the complex twiddle factors and complex intermediate data.
  - Complicated addressing logic and control circuitry.

## Background (cont'd)

- Single butterfly architecture requires each butterfly computation to be done in 20.8ns, resulting in 48MHz clock frequency.
  - High power dissipation of the chip.
- Multiple parallel butterfly architecture may be needed to run the circuit at 20Mhz frequency (the same as the transmission signal bandwidth).
  - Increases area significantly.
  - Multiple butterfly units increase power dissipation.
- Need an alternate architecture which has good balance in performance, power, and area.
  - Express the 64-point FFT/IFFT using two-dimensional 8-point FFT/IFFT.

# Mathematical Explanation

# The Math Behind

- A Discrete Fourier Transform (DFT)  $A(r)$  from a series of  $N$  complex data
- $B(k)$  can be written as follows.
  - $A(r) = \sum_{k=0}^{N-1} B(k) W_N^{rk}$  where  $r, k \in \{0, 1, \dots, N - 1\}$
  - $W_N = e^{-2\pi j/N}$  which is  $N$ -point Twiddle Factor.
- If we substitute the variable with  $N = MT$ ,  $r = s + Tt$ , and  $k = l + Mm$  for  $s, l \in \{0, 1, \dots, 7\}$  and  $m, t \in \{0, 1, \dots, T - 1\}$ , we can rewrite the equation as follows.
  - $A(s + Tt) = \sum_{l=0}^{M-1} W_M^{lt} \left[ W_{MT}^{sl} \sum_{m=0}^{T-1} B(l + Mm) W_T^{sm} \right]$

## The Math Behind (cont'd)

- For 64-point FFT/IFFT, we can decompose them into two-dimensional 8-point FFT/IFFT.
  - $A(s + 8t) = \sum_{l=0}^7 [W_{64}^{sl} \sum_{m=0}^7 B(l + 8m) W_8^{sm}] W_8^{lt}$
- $W_8^{sm}$  and  $W_8^{lt}$  are 8-point Twiddle Factor which consist of 8 constant complex values.
  - Only two of those values requires constant complex multiplier.
  - Implemented as shift-and-add, not as a "true" multiplier.
- $W_{64}^{sl}$  is 64-point interdimensional Twiddle Factor which consists of 64 constant complex values.
  - We can use eight of those values to construct all 64 values.
  - Implemented as shift-and-add, not as a "true" multiplier.

# Circuit Architecture

# Data Format

- We use Q4.12 Signed Fixed-Point format for both real data and imaginary data to form 32-bit complex data.
  - Largest value is **7.9997558593750** (0b0111111111111111)
  - Smallest value is **-8.00000000000000** (0b1000000000000000)
  - Resolution is **0.0002441406250** (0b0000000000000001)

## Complex Data Set (32-bit)

31...16 Real Data (16-bit)

15...0 Imaginary Data (16-bit)

# Block Diagram



# Input Circuit

- Perform real-imaginary swapping, data buffering, and serial-to-parallel conversion
  - Serial data is converted into 8 parallel data.
- When the input register bank is full, every 8th data is an input to the 8-point FFT unit
- Data shifted every cycle - avoids a parallel multiplexing scheme
- Because the multiplier may take more than one cycle to process the data, three additional buffers used
- Counter - controls the serial input to the data.  
Provides the trigger to the master control circuit



# Input Circuit (cont'd)

- Real-Imaginary swapping
  - Choose whether FFT or IFFT operation is performed.
  - FFT (mode 0) operation does not require the real and imaginary part to be swapped at the beginning and ending, but IFFT (mode 1) does.
- Data Buffering
  - The input data is serial from B(0) into B(63), but 8-point FFT block requires eight parallel data.
  - The buffer will buffer 64 serial data and group them into 8 sets each with 8 data.
  - Each set of data will become a set of input to the first stage of 8-point FFT block.

# Input Circuit (cont'd)

- Data set arrangement

- One Input data will be latched into Input Circuit per clock cycle, from B(0) to B(63) with a total of 64 data.
- Then, the data is grouped into 8 sets as shown below.
- Each set enters the 8-point FFT block per clock cycle.

| Set | Input Data                                              |
|-----|---------------------------------------------------------|
| 0   | $B(0), B(8), B(16), B(24), B(32), B(40), B(48), B(56)$  |
| 1   | $B(1), B(9), B(17), B(25), B(33), B(41), B(49), B(57)$  |
| 2   | $B(2), B(10), B(18), B(26), B(34), B(42), B(50), B(58)$ |
| 3   | $B(3), B(11), B(19), B(27), B(35), B(43), B(51), B(59)$ |
| 4   | $B(4), B(12), B(20), B(28), B(36), B(44), B(52), B(60)$ |
| 5   | $B(5), B(13), B(21), B(29), B(37), B(45), B(53), B(61)$ |
| 6   | $B(6), B(14), B(22), B(30), B(38), B(46), B(54), B(62)$ |
| 7   | $B(7), B(15), B(23), B(31), B(39), B(47), B(55), B(63)$ |

# 8-point FFT unit

- Two 8-point FFT units are instantiated: first stage and second stage.
  - Blue: Complex Adder, implemented as two 16-bit kogge-stone adder.
  - Green: Complex Constant Multiplier, implemented as shift-and-add using 32-bit kogge-stone adder and bit truncation unit.
- Only two of eight 8-point twiddle factors values require constant multiplication.
  - square root of 2. (shift-and-add)
  - half of square root of 2. (shift-and-add)



# 8-point FFT unit (cont'd)

- Constant Multiplier
  - Implemented only as a shift-and-add, no need for "true" multiplier.

$\text{sqrt}(2)$



$\text{sqrt}(2)/2$



# 8-point FFT unit (cont'd)

- Four important Twiddle Factors
  - The other can be performed by inverse (multiplying with -1), or perform two's complement.

|         |                                              |          |
|---------|----------------------------------------------|----------|
| $W_N^0$ | 1                                            | $W_N^0$  |
| $W_N^1$ | $\frac{\sqrt{2}}{2} - j \frac{\sqrt{2}}{2}$  | $W_N^1$  |
| $W_N^2$ | $-j1$                                        | $W_N^2$  |
| $W_N^3$ | $-\frac{\sqrt{2}}{2} - j \frac{\sqrt{2}}{2}$ | $W_N^3$  |
| $W_N^4$ | $-1$                                         | $-W_N^0$ |
| $W_N^5$ | $-\frac{\sqrt{2}}{2} + j \frac{\sqrt{2}}{2}$ | $-W_N^1$ |
| $W_N^6$ | $j1$                                         | $-W_N^2$ |
| $W_N^7$ | $\frac{\sqrt{2}}{2} + j \frac{\sqrt{2}}{2}$  | $-W_N^3$ |



# Interdimensional multiplier

- The output from the first 8-point FFT unit is multiplied with 64-point interdimensional twiddle factor.
- Out of 64 values, there are nine unique values that can be used to construct other values.
  - (1,0) [bypass circuit]
  - (0.995178, 0.097961)
  - (0.980773, 0.195068)
  - (0.956909, 0.290283)
  - (0.923828, 0.382629)
  - (0.881896, 0.471374)
  - (0.831420, 0.555541)
  - (0.773010, 0.634338)
  - (0.707092, 0.707092)

| Type | Input Operand | Pre-Processing                                       | Constant Operand | Result                   | Post Processing                       |
|------|---------------|------------------------------------------------------|------------------|--------------------------|---------------------------------------|
| 1    | $A + jB$      | NONE<br>$A + jB$                                     | $C + jD$         | $AC - BD + j(AD + BC)$   | NONE<br>$AC - BD + j(AD + BC)$        |
| 2    | $A + jB$      | IM_INVERT<br>$A - jB$                                | $C + jD$         | $AC + BD + j(AD - BC)$   | IM_INVERT<br>$AC + BD + j(-AD + BC)$  |
| 3    | $A + jB$      | RE_INVERT<br>$-A + jB$                               | $C + jD$         | $-AC - BD + j(-AD + BC)$ | IM_INVERT<br>$-AC - BD + j(AD - BC)$  |
| 4    | $A + jB$      | RE_INVERT<br>IM_INVERT<br>$-A - jB$                  | $C + jD$         | $-AC + BD + j(-AD - BC)$ | NONE<br>$-AC + BD + j(-AD - BC)$      |
| 5    | $A + jB$      | RE-IM<br>SWAP<br>$B + jA$                            | $C + jD$         | $-AD + BC + j(AC + BD)$  | RE_INVERT<br>$AD - BC + j(AC + BD)$   |
| 6    | $A + jB$      | RE-IM<br>SWAP<br>IM_INVERT<br>$B - jA$               | $C + jD$         | $AD + BC + j(-AC + BD)$  | NONE<br>$AD + BC + j(-AC + BD)$       |
| 7    | $A + jB$      | RE-IM<br>SWAP<br>RE_INVERT<br>$-B + jA$              | $C + jD$         | $-AD - BC + j(AC - BD)$  | NONE<br>$-AD - BC + j(AC - BD)$       |
| 8    | $A + jB$      | RE-IM<br>SWAP<br>RE_INVERT<br>IM_INVERT<br>$-B - jA$ | $C + jD$         | $AD - BC + j(-AC - BD)$  | RE_INVERT<br>$-AD + BC + j(-AC - BD)$ |

# Interdimensional multiplier (cont'd)

- The nine twiddle factors defined as constant C, C+S, and C-S.

| Constant   | C+jS                                     | C+S              | C                | C-S              |
|------------|------------------------------------------|------------------|------------------|------------------|
| Constant 0 | $1.000000000000000 + j0.000000000000000$ | 0001000000000000 | 0001000000000000 | 0001000000000000 |
| Constant 1 | $0.995184726672197 + j0.098017140329561$ | 0001000101111110 | 000011111101100  | 0000111001011011 |
| Constant 2 | $0.980785280403230 + j0.195090322016128$ | 0001001011010000 | 0000111110110001 | 0000110010010010 |
| Constant 3 | $0.956940335732209 + j0.290284677254462$ | 0001001111110101 | 0000111101010000 | 0000101010101011 |
| Constant 4 | $0.923879532511287 + j0.382683432365090$ | 0001001111010000 | 0000111011001000 | 0000100010101001 |
| Constant 5 | $0.881921264348355 + j0.471396736825998$ | 0001010110100111 | 0000111000011100 | 0000011010010010 |
| Constant 6 | $0.831469612302545 + j0.555570233019602$ | 0001011000110001 | 0000110101001110 | 0000010001101010 |
| Constant 7 | $0.773010453362737 + j0.634393284163645$ | 0001011010000101 | 0000110001011110 | 0000001000111000 |
| Constant 8 | $0.707106781186547 + j0.707106781186547$ | 0001011010100001 | 0000101101010000 | 0000000000000000 |

# Interdimensional multiplier (cont'd)

- Bypass Circuit
  - Use to perform multiplication with trivial twiddle factor.
  - $1.000000000000000 + j0.000000000000000$
  - $0.000000000000000 - j1.000000000000000$
  - $-1.000000000000000 + j0.000000000000000$
  - $0.000000000000000 + j1.000000000000000$



# Interdimensional multiplier (cont'd)

- Constant Multiplier Block
  - Use to perform multiplication with non-trivial twiddle factor.



# Interdimensional multiplier (cont'd)

- Constant Multiplier Block Operation Type
  - Use to select the pre-processing and post-processing method for operand

| TYPE | Pre Processing    |     |     | Post Processing   |     |     | Control Signal |
|------|-------------------|-----|-----|-------------------|-----|-----|----------------|
|      | RE-IM Interchange | RE  | IM  | RE-IM Interchange | RE  | IM  |                |
| 0    | NO                | POS | POS | NO                | POS | POS | 000            |
| 1    | NO                | POS | NEG | NO                | POS | NEG | 001            |
| 2    | NO                | NEG | POS | NO                | POS | NEG | 010            |
| 3    | No                | NEG | NEG | NO                | POS | POS | 011            |
| 4    | YES               | POS | POS | NO                | NEG | POS | 100            |
| 5    | YES               | POS | NEG | NO                | POS | POS | 101            |
| 6    | YES               | NEG | POS | NO                | POS | POS | 110            |
| 7    | YES               | NEG | NEG | NO                | NEG | POS | 111            |

# Interdimensional multiplier (cont'd)

- Set Computation
  - Interdimensional multiplier perform different mode of computation for each set of data given by the first stage of 8-point FFT block.
  - There are 8 sets of data and 8 data per set.
  - In case when more than one data in each set need to be multiplied using the same constant, the operation will take multiple cycle per set (e.g., Set 4 takes 4 cycle).

| SET   | DATA0  |      | DATA1  |      | DATA2  |      | DATA3  |      | DATA4  |      | DATA5  |      | DATA6  |      | DATA7  |      |
|-------|--------|------|--------|------|--------|------|--------|------|--------|------|--------|------|--------|------|--------|------|
|       | CONST  | TYPE |
| SET 0 | BYPASS | 1    |
| SET 1 | BYPASS | 1    | 1      | 1    | 2      | 1    | 3      | 1    | 4      | 1    | 5      | 1    | 6      | 1    | 7      | 1    |
| SET 2 | BYPASS | 1    | 2      | 1    | 4      | 1    | 6      | 1    | 8      | 1    | HOLD   | HL   | HOLD   | HL   | HOLD   | HL   |
| SET 3 | HOLD   | 1    | HOLD   | HL   | HOLD   | HL   | HOLD   | HL   | HOLD   | HL   | 6      | 5    | 4      | 5    | 2      | 5    |
| SET 4 | BYPASS | 1    | 3      | 1    | 6      | 1    | 7      | 5    | 4      | 5    | 1      | 5    | 2      | 7    | 5      | 7    |
| SET 4 | BYPASS | 1    | 4      | 1    | 8      | 1    | HOLD   | HL   | BYPASS | 5    | HOLD   | HL   | HOLD   | HL   | HOLD   | HL   |
|       | HOLD   | HL   | HOLD   | HL   | HOLD   | HL   | 4      | 5    | HOLD   | HL   | HOLD   | HL   | HOLD   | HL   | HOLD   | HL   |
|       | HOLD   | HL   | 4      | 7    | 8      | 7    | HOLD   | HL   |
| SET 5 | BYPASS | 1    | 5      | 1    | 6      | 5    | 1      | 5    | 4      | 7    | 7      | 3    | 2      | 3    | 3      | 2    |
| SET 6 | BYPASS | 1    | 6      | 1    | 4      | 5    | 2      | 7    | 8      | 7    | HOLD   | HL   | HOLD   | HL   | HOLD   | HL   |
| SET 7 | HOLD   | HL   | 2      | 3    | 4      | 2    | 6      | 6    |
| SET 8 | BYPASS | 1    | 7      | 1    | 2      | 5    | 5      | 7    | 4      | 3    | 3      | 2    | 6      | 6    | 1      | 1    |

# Interdimensional multiplier (cont'd)

- Complete Circuit
  - Consists of 8 Complex Constant Multiplier blocks and 8 Bypass Circuits.
  - There is a DFF in each Complex Constant Multiplier to facilitate multi-cycle computation (e.g., for the case of Set 2, Set 4, and Set 6)
  - There are 8 multiplexers and 8 demultiplexers to allow us to select which Constant Multiplier should be used for each 8 data in one set.

# Interdimensional multiplier (cont'd)

- Complete Circuit (cont'd)



# CB Circuit

- Buffer and prepare data for the second stage of 8-point FFT unit.



# Output Circuit

- Perform parallel-to-serial conversion and real-imaginary swapping.
  - Mode = 0 -> FFT mode output
  - Mode = 1 -> IFFT mode output
- Output Register Bank is of length 57
- The 6-bit control counter counts through 64 states and asserts the control signal after 64 states to show that the output data is valid



# State Machine

- Input Counter:
  - 6-bit counter that counts the number of serial data buffered in Input Circuit
  - Trigger the Master Control if the data is ready.
- Master Control:
  - Control everything in the circuit.
  - Consists of 23 states, thus assuming the input circuit is already full, each FFT/IFFT operation can be completed in 23 clock cycle.
- Output Counter:
  - 6-bit counter that counts the number of serial data sent out from Output Circuit.
  - Assert the Data Valid signal.

# Computation Timeline



## Computation Timeline (cont'd)

- Consecutive computation can be triggered after 70 clock cycles of previous Data\_Start signal assertion
  - Indicated by Next\_Data signal.
  - Next computation can be triggered one clock cycle after Next\_Data signal changes from low to high.
- Therefore, effectively, one computation can be performed for 70 clock cycles.
  - The time needed to get all of the result out of the output circuit can be overlapped by the time needed for new data to get buffered into input circuit.
  - At 200Mhz, each FFT/IFFT operation can be completed at **0.35μs**.
  - At 125Mhz, each FFT/IFFT operation can be completed at **0.56μs**.
  - At 20Mhz (802.11a baseband), each FFT/IFFT operation can be completed at **3.5μs**.
  - 802.11a requires each FFT/IFFT to be completed at **4.0μs. (MET)**

# Functional Simulation

# Testing Methodology

- We used DVE for functional simulation to verify our circuit.
  - Compare of the results with MATLAB program : MATLAB program for computation of FFT/IFFT is written and design results are compared with the MATLAB result.
- Test Architecture
  - The data is first passed through an FFT instance of the design followed by an IFFT instance of the design.
  - Comparison is performed to check equivalence between data input and data output.



# Test 1

- Input
    - $x[0] = 0x00290000$   
(Corresponds to  $0.01 + j 0.00$ )
    - $x[1..63] = 0x00000000$   
(Corresponds to  $0.00 + j 0.00$ )
  - FFT Output (from Matlab):
    - $X[0..63] = 0x00290000$   
(Corresponds to  $0.01 + j 0.00$ )

```
>> FFT64Point  
X[1] 00290000  
X[2] 00290000  
X[3] 00290000  
X[4] 00290000  
X[5] 00290000  
X[6] 00290000  
X[7] 00290000  
X[8] 00290000  
X[9] 00290000  
X[10] 00290000  
X[11] 00290000  
X[12] 00290000  
X[13] 00290000  
X[14] 00290000  
X[15] 00290000  
X[16] 00290000  
X[17] 00290000  
X[18] 00290000  
X[19] 00290000  
X[20] 00290000  
X[21] 00290000  
X[22] 00290000  
X[23] 00290000  
X[24] 00290000  
X[25] 00290000  
X[26] 00290000  
X[27] 00290000  
X[28] 00290000  
X[29] 00290000  
X[30] 00290000  
X[31] 00290000  
X[32] 00290000
```

New to MATLAB? See resources for [Getting Started](#).

```
X[32] 00290000
X[33] 00290000
X[34] 00290000
X[35] 00290000
X[36] 00290000
X[37] 00290000
X[38] 00290000
X[39] 00290000
X[40] 00290000
X[41] 00290000
X[42] 00290000
X[43] 00290000
X[44] 00290000
X[45] 00290000
X[46] 00290000
X[47] 00290000
X[48] 00290000
X[49] 00290000
X[50] 00290000
X[51] 00290000
X[52] 00290000
X[53] 00290000
X[54] 00290000
X[55] 00290000
X[56] 00290000
X[57] 00290000
X[58] 00290000
X[59] 00290000
X[60] 00290000
X[61] 00290000
X[62] 00290000
X[63] 00290000
X[64] 00290000
>>
>>
>>
>>
```

# Test 1 (cont'd)

- Input Waveform



# Test 1 (cont'd)

- FFT Stage Output



- IFFT Stage Output



# Test 2

- Input
  - $x[0..63] = 0x00290000$   
 (Corresponds to  $0.01 + j 0.00$ )
- FFT Output (from Matlab):
  - $X[0] = 0x00A40000$   
 (Corresponds to  $0.04 + j 0.00$ )
  - $X[1..63] = 0x00000000$   
 (Corresponds to  $0.00 + j 0.00$ )



```

New to MATLAB? See resources for Getting Started.
>> FFT64Point
X[1] 0A400000
X[2] 00000000
X[3] 00000000
X[4] 00000000
X[5] 00000000
X[6] 00000000
X[7] 00000000
X[8] 00000000
X[9] 00000000
X[10] 00000000
X[11] 00000000
X[12] 00000000
X[13] 00000000
X[14] 00000000
X[15] 00000000
X[16] 00000000
X[17] 00000000
X[18] 00000000
X[19] 00000000
X[20] 00000000
X[21] 00000000
X[22] 00000000
X[23] 00000000
X[24] 00000000
X[25] 00000000
X[26] 00000000
X[27] 00000000
X[28] 00000000
X[29] 00000000
X[30] 00000000
X[31] 00000000
X[32] 00000000

```



```

New to MATLAB? See resources for Getting Started.
>>
>>
>>
>>
X[33] 00000000
X[34] 00000000
X[35] 00000000
X[36] 00000000
X[37] 00000000
X[38] 00000000
X[39] 00000000
X[40] 00000000
X[41] 00000000
X[42] 00000000
X[43] 00000000
X[44] 00000000
X[45] 00000000
X[46] 00000000
X[47] 00000000
X[48] 00000000
X[49] 00000000
X[50] 00000000
X[51] 00000000
X[52] 00000000
X[53] 00000000
X[54] 00000000
X[55] 00000000
X[56] 00000000
X[57] 00000000
X[58] 00000000
X[59] 00000000
X[60] 00000000
X[61] 00000000
X[62] 00000000
X[63] 00000000
X[64] 00000000
>>
>>
>>
>>

```

# Test 2 (cont'd)

- Input Waveform



# Test 2 (cont'd)

- FFT Stage Output



- IFFT Stage Output



# Test 3

- Input
    - $x[0] = 00290029$
    - $x[1] = 00000000$
    - .....
    - $x[62] = 00290029$
    - $x[63] = 00000000$
  - FFT Output (from Matlab):
    - $X[0] = 0x05200520$   
(Corresponds to  $0.32 + j 0.32$ )
    - $X[32] = 0x05200520$   
(Corresponds to  $0.32 + j 0.32$ )
    - $X[Others] = 0x00000000$   
(Corresponds to  $0.00 + j 0.00$ )

New to MATLAB? See resources for [Getting Started](#)

```
>> FFT64Point
X[1] 05200520
X[2] 00000000
X[3] 00000000
X[4] 00000000
X[5] 00000000
X[6] 00000000
X[7] 00000000
X[8] 00000000
X[9] 00000000
X[10] 00000000
X[11] 00000000
X[12] 00000000
X[13] 00000000
X[14] 00000000
X[15] 00000000
X[16] 00000000
X[17] 00000000
X[18] 00000000
X[19] 00000000
X[20] 00000000
X[21] 00000000
X[22] 00000000
X[23] 00000000
X[24] 00000000
X[25] 00000000
X[26] 00000000
X[27] 00000000
X[28] 00000000
X[29] 00000000
X[30] 00000000
X[31] 00000000
X[32] 00000000
```

```
X[32] 00000000
X[33] 05200520
X[34] 00000000
X[35] 00000000
X[36] 00000000
X[37] 00000000
X[38] 00000000
X[39] 00000000
X[40] 00000000
X[41] 00000000
X[42] 00000000
X[43] 00000000
X[44] 00000000
X[45] 00000000
X[46] 00000000
X[47] 00000000
X[48] 00000000
X[49] 00000000
X[50] 00000000
X[51] 00000000
X[52] 00000000
X[53] 00000000
X[54] 00000000
X[55] 00000000
X[56] 00000000
X[57] 00000000
X[58] 00000000
X[59] 00000000
X[60] 00000000
X[61] 00000000
X[62] 00000000
X[63] 00000000
X[64] 00000000
>>
>>
>>
>>
```

# Test 3 (cont'd)

- Input Waveform



# Test 3 (cont'd)

- FFT Stage Output



# Test 3 (cont'd)

- IFFT Stage Output



# Test 4

| Data | Input → FFT Block | FFT Block → IFFT Block | IFFT Block → Output | Data | Input → FFT Block | FFT Block → IFFT Block | IFFT Block → Output |
|------|-------------------|------------------------|---------------------|------|-------------------|------------------------|---------------------|
| 0    | 00050107          | 21682168               | 00050107            | 32   | 00880084          | ff7e0082               | 00880084            |
| 1    | 00090103          | 09ec0aec               | 00080102            | 33   | 008C007F          | ff7a007c               | 008c007e            |
| 2    | 000D00FE          | 04b005b5               | 000c00fe            | 34   | 0090007B          | ff6c0077               | 0090007b            |
| 3    | 001100FA          | 02f503fc               | 001100fa            | 35   | 00940077          | ff6b0070               | 00930076            |
| 4    | 001500F6          | 02100317               | 001500f6            | 36   | 00980073          | ff640069               | 00970072            |
| 5    | 001900F2          | 01870291               | 001800f1            | 37   | 009C006F          | ff590065               | 009c006f            |
| 6    | 001D00EE          | 012d0232               | 001d00ed            | 38   | 00A0006B          | ff55005c               | 00a0006a            |
| 7    | 002100EA          | 00ea01f3               | 002100ea            | 39   | 00A40067          | ff4e0051               | 00a30067            |
| 8    | 002500E6          | 00ba01c0               | 002500e5            | 40   | 00A80063          | ff44004e               | 00a70063            |
| 9    | 002900E2          | 00940199               | 002900e2            | 41   | 00AD005F          | ff3c0045               | 00ac005e            |
| 10   | 002E00DE          | 0073017a               | 002e00dd            | 42   | 00B1005B          | ff37003c               | 00b1005a            |
| 11   | 003200DA          | 0058015e               | 003200d9            | 43   | 00B50057          | ff2c0036               | 00b40057            |
| 12   | 003600D5          | 0044014d               | 003500d4            | 44   | 00B90052          | ff24002b               | 00b90052            |
| 13   | 003A00D1          | 002d0132               | 003a00d1            | 45   | 00BD004E          | ff1b0024               | 00bc004e            |
| 14   | 003E00CD          | 001c0123               | 003d00cd            | 46   | 00C1004A          | ff120017               | 00c10049            |
| 15   | 004200C9          | 000f0116               | 004200c8            | 47   | 00C50046          | ff05000a               | 00c40046            |
| 16   | 004600C5          | 00000106               | 004600c5            | 48   | 00C90042          | fefa0000               | 00c90042            |
| 17   | 004A00C1          | fff500fb               | 004a00c1            | 49   | 00CD003E          | fee9fff1               | 00cc003d            |
| 18   | 004E00BD          | ffe900f4               | 004e00bd            | 50   | 00D1003A          | fedbfffe4              | 00d0003a            |
| 19   | 005200B9          | ffde00e3               | 005100b9            | 51   | 00D50036          | fecafffd1              | 00d50035            |
| 20   | 005700B5          | ffd500d8               | 005700b5            | 52   | 00DA0032          | febfffcc0              | 00d90032            |
| 21   | 005B00B1          | ffce00d0               | 005b00b0            | 53   | 00DE002E          | fea2ffa6               | 00dd002d            |
| 22   | 005F00AD          | ffc200c9               | 005f00ac            | 54   | 00E20029          | fe88ff8d               | 00e10029            |
| 23   | 006300A8          | ffbd00c0               | 006200a8            | 55   | 00E60025          | fe67ff70               | 00e50025            |
| 24   | 006700A4          | ffb200bc               | 006600a4            | 56   | 00EA0021          | fe40ff46               | 00ea0020            |
| 25   | 006B00A0          | ffb000af               | 006b009f            | 57   | 00EE001D          | fe0cff17               | 00ee001d            |
| 26   | 006F009C          | ffa700ab               | 006e009c            | 58   | 00F20019          | fddffedb               | 00f20018            |
| 27   | 00730098          | ff9d00a4               | 00720098            | 59   | 00F60015          | fd6ffe78               | 00f60014            |
| 28   | 00770094          | ff9500a1               | 00760094            | 60   | 00FA0011          | fcebffff               | 00fa0010            |
| 29   | 007B0090          | ff900097               | 007b0090            | 61   | 00FE000D          | fc08fd0f               | 00fe000c            |
| 30   | 007F008C          | ff8a0092               | 007f008c            | 62   | 01030009          | fa4cfb50               | 01020009            |
| 31   | 00840088          | ff840087               | 00840087            | 63   | 01070005          | f514f615               | 01070004            |

## Test 4 (cont'd)

- Rounding error makes the output of the IFFT block slightly different than the input to the FFT block.
  - Rounding is done after each constant multiplication is performed.
  - Error from rounding is around 1 or 2 LSB where most of them differ at 1 LSB.
  - Constant multiplier produce 32-bit output and must be truncated to 16-bit data.
  - Need to investigate a better rounding circuit to reduce error.

# HDL Synthesis

# HDL Synthesis

- We used Design Compiler to Synthesize our design.
  - Target clock is 200Mhz.
  - We will reduce the target clock to 125MHz during Automatic Place and Route.
- Design Compiler produces gate-level netlist and SDC for Innovus to do Automatic Place and Route.

# Initial Synthesis

```

clock pclk (rise edge)          5.0000  5.0000
clock network delay (ideal)    0.0000  5.0000
cb_unit/dff_segment_for_cb_inst6/dff_inputsel_hold_sync_high_reset_inst1/dff_hold_sync_high_reset_inst0/dff_sync_high_reset_inst0/Q_reg[11]/CLK
(DFFPOSX1)
                                0.0000  5.0000 r
library setup time             -0.0557  4.9443
data required time              4.9443
-----
data required time              4.9443
data arrival time               -4.9443
-----
slack (MET)                   0.0001

```

|                                |                                    |
|--------------------------------|------------------------------------|
| Number of ports:               | 685916                             |
| Number of nets:                | 861036                             |
| Number of cells:               | 230321                             |
| Number of combinational cells: | 169030                             |
| Number of sequential cells:    | 6438                               |
| Number of macros/black boxes:  | 0                                  |
| Number of buf/inv:             | 54342                              |
| Number of references:          | 11                                 |
|                                |                                    |
| Combinational area:            | 422070.107278                      |
| Buf/Inv area:                  | 77055.304258                       |
| Noncombinational area:         | 51363.006660                       |
| Macro/Black Box area:          | 0.000000                           |
| Net Interconnect area:         | undefined (No wire load specified) |
|                                |                                    |
| Total cell area:               | 473433.113939                      |
| Total area:                    | undefined                          |

# Synthesis Optimization

```

clock pclk (rise edge)          5.00000  5.00000
clock network delay (ideal)    0.00000  5.00000
cb_unit/dff_segment_for_cb_inst6/dff_inputsel_hold_sync_high_reset_inst2/dff_hold_sync_high_reset_inst0/dff_sync_high_reset_inst0/Q_reg[30]/CLK
(DFFPOSX1)
                                         0.00000  5.00000 r
library setup time                -0.05567  4.94433
data required time                 4.94433
-----
data required time                 4.94433
data arrival time                  -4.94418
-----
slack (MET)                      0.00015

```

|                                |                                    |
|--------------------------------|------------------------------------|
| Number of ports:               | 685916                             |
| Number of nets:                | 860953                             |
| Number of cells:               | 230238                             |
| Number of combinational cells: | 168947                             |
| Number of sequential cells:    | 6438                               |
| Number of macros/black boxes:  | 0                                  |
| Number of buf/inv:             | 54323                              |
| Number of references:          | 11                                 |
|                                |                                    |
| Combinational area:            | 421970.146392                      |
| Buf/Inv area:                  | 77021.514660                       |
| Noncombinational area:         | 51363.006660                       |
| Macro/Black Box area:          | 0.000000                           |
| Net Interconnect area:         | undefined (No wire load specified) |
|                                |                                    |
| Total cell area:               | 473333.153052                      |
| Total area:                    | undefined                          |

# Automatic Place and Route

# Overview

- Performed Automatic Place and Route using **Cadence Innovus 18.1** Implementation System.
  - Encounter license available on LRC Machine does not support design with number of instance more than 50,000.
- Use 125MHz (8ns clock period) as target speed.
  - In DC, we can achieve 200MHz.
  - To simplify our PNR effort, we reduce the target speed to 125MHz.
- Use FreePDK45 and its timing library for PNR.
  - Multi-mode multi-corner (MMMC) analysis with only one corner (typical).
  - Timing library used is gscl45nm.lib

# MMMC Setup

- Create “default” RC Corner and Library Set.
  - All settings are initialized default by Innovus.
  - Single library set gscl45nm.lib
- Create operating condition “typical”
  - Only single operating condition is defined in gscl45nm.lib
  - PVT is 1.0, 1.1, and 27 respectively.
- Create delay corner with typical operating condition and default RC Corner
- Create constraint based on SDC file generated by DC.
- Create analysis view based on the delay corner.
  - Same analysis view for both Setup and Hold

# Initialize the Design

- Verilog Gate Level Netlist
  - fft\_64p\_16b\_top.gatelevel.v
- MMMC File
  - Default.view (created on the previous slide)
- LEF File
  - gscl45nm.lef
- Power Net and GND Net
  - Power Net is vdd
  - GND Net is gnd

# Timing Sanity Check

- PrePlace Timing Report
  - Initial netlist time is clean.
  - We have some positive slack for setup time to play with during APR.

```
-----  
timeDesign Summary  
-----  
  
Setup views included:  
  default  
  
+-----+-----+-----+-----+  
|      Setup mode    |    all    | reg2reg | default |  
+-----+-----+-----+-----+  
|          WNS (ns) :|  2.104   |  2.104   |  6.571   |  
|          TNS (ns) :|  0.000   |  0.000   |  0.000   |  
| Violating Paths:|    0     |    0     |    0     |  
|      All Paths:  |  6438   |  6438   |  6438   |  
+-----+-----+-----+-----+  
  
Density: 0.000%  
-----
```

# Floorplanning

- Create Floorplan with Ratio 0.5, 85% Initial Utilization, 20 $\mu$ m core-to-I/O boundary.
  - Row spacing is 0.0.
  - Double-back Row.
- Place the pins around the floorplan.
  - Spread the In\_Stream[31:0] on the left and Out\_Stream[31:0] on the right.
  - The rest of the pins are spread on the top.

# Floorplanning (cont'd)



# Powerplanning

- Create Global Net Connect
  - vdd and gnd pgpin should be connected to net vdd and gnd, respectively.
  - tiehi and tielo should be connected to net vdd and gnd, respectively.
- Create Core Ring
  - Metal5 for horizontal ring, Metal6 for vertical ring.
- Create Core Straps
  - 32 sets of Metal6 Vertical Straps.
- Create Row Straps
  - Metal1 Horizontal Straps.
- Vias and stacked vias are created automatically.
  - e.g. from Metal1 horizontal straps to Metal6 vertical straps.

# Powerplanning (cont'd)



# Placement

- Place the standard cells.
  - Congestion effort is high.
  - Timing driven.
- Post-placement Timing
  - Some setup violations, can be fixed during Pre-CTS Optimization.
  - Some DRV violations on maximum capacitance, can be fixed during Pre-CTS Optimization.

| timeDesign Summary                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                |           |                |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|-----------|----------------|--|------------|------|---------|---------|--|----------------|-----------|----------------|-------|---------|-----------|--------|-----------|-------|----------|------------------|-------|-------|---|------------|------------|------|-------|------|------------|-------|---|-------|--|
| Setup views included:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                |           |                |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| default                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                |           |                |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| <table border="1"> <thead> <tr> <th>Setup mode</th><th>all</th><th>reg2reg</th><th>default</th><th></th></tr> </thead> <tbody> <tr> <td>WNS (ns):</td><td>-0.035</td><td>-0.035</td><td>5.250</td><td></td></tr> <tr> <td>TNS (ns):</td><td>-0.065</td><td>-0.065</td><td>0.000</td><td></td></tr> <tr> <td>Violating Paths:</td><td>2</td><td>2</td><td>0</td><td></td></tr> <tr> <td>All Paths:</td><td>6438</td><td>6438</td><td>6438</td><td></td></tr> </tbody> </table>                                                                                            |                |           |                |  | Setup mode | all  | reg2reg | default |  | WNS (ns):      | -0.035    | -0.035         | 5.250 |         | TNS (ns): | -0.065 | -0.065    | 0.000 |          | Violating Paths: | 2     | 2     | 0 |            | All Paths: | 6438 | 6438  | 6438 |            |       |   |       |  |
| Setup mode                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | all            | reg2reg   | default        |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| WNS (ns):                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | -0.035         | -0.035    | 5.250          |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| TNS (ns):                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | -0.065         | -0.065    | 0.000          |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| Violating Paths:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 2              | 2         | 0              |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| All Paths:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 6438           | 6438      | 6438           |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| <table border="1"> <thead> <tr> <th rowspan="2">DRVs</th><th colspan="2">Real</th><th colspan="2">Total</th></tr> <tr> <th>Nr nets(terms)</th><th>Worst Vio</th><th>Nr nets(terms)</th><th></th></tr> </thead> <tbody> <tr> <td>max_cap</td><td>125 (125)</td><td>-3.432</td><td>125 (125)</td><td></td></tr> <tr> <td>max_tran</td><td>0 (0)</td><td>0.000</td><td>0 (0)</td><td></td></tr> <tr> <td>max_fanout</td><td>0 (0)</td><td>0</td><td>0 (0)</td><td></td></tr> <tr> <td>max_length</td><td>0 (0)</td><td>0</td><td>0 (0)</td><td></td></tr> </tbody> </table> |                |           |                |  | DRVs       | Real |         | Total   |  | Nr nets(terms) | Worst Vio | Nr nets(terms) |       | max_cap | 125 (125) | -3.432 | 125 (125) |       | max_tran | 0 (0)            | 0.000 | 0 (0) |   | max_fanout | 0 (0)      | 0    | 0 (0) |      | max_length | 0 (0) | 0 | 0 (0) |  |
| DRVs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Real           |           | Total          |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Nr nets(terms) | Worst Vio | Nr nets(terms) |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| max_cap                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 125 (125)      | -3.432    | 125 (125)      |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| max_tran                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 0 (0)          | 0.000     | 0 (0)          |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| max_fanout                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 0 (0)          | 0         | 0 (0)          |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| max_length                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 0 (0)          | 0         | 0 (0)          |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| Density: 82.649%                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                |           |                |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |
| Routing Overflow: 0.00% H and 0.00% V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                |           |                |  |            |      |         |         |  |                |           |                |       |         |           |        |           |       |          |                  |       |       |   |            |            |      |       |      |            |       |   |       |  |

# Placement (cont'd)



# Placement (cont'd)



# Pre-CTS Optimization

- Optimize placement to fix setup time and DRV Violation.
  - Use useful skew if possible.
  - Effort is high.
  - Area reclaim.
- Pre-CTS Optimization timing
  - All of violations are fixed.
  - Density decreases because of area reclaim.

| optDesign Final Summary               |                |           |                |  |
|---------------------------------------|----------------|-----------|----------------|--|
| Setup views included:                 |                |           |                |  |
| default                               |                |           |                |  |
| <hr/>                                 |                |           |                |  |
| Setup mode                            | all            | reg2reg   | default        |  |
| WNS (ns):                             | 0.921          | 0.921     | 6.113          |  |
| TNS (ns):                             | 0.000          | 0.000     | 0.000          |  |
| Violating Paths:                      | 0              | 0         | 0              |  |
| All Paths:                            | 6406           | 6406      | 6406           |  |
| <hr/>                                 |                |           |                |  |
| <hr/>                                 |                |           |                |  |
| Real                                  |                |           |                |  |
| Total                                 |                |           |                |  |
| DRVs                                  | Nr nets(terms) | Worst Vio | Nr nets(terms) |  |
| max_cap                               | 0 (0)          | 0.000     | 0 (0)          |  |
| max_tran                              | 0 (0)          | 0.000     | 0 (0)          |  |
| max_fanout                            | 0 (0)          | 0         | 0 (0)          |  |
| max_length                            | 0 (0)          | 0         | 0 (0)          |  |
| <hr/>                                 |                |           |                |  |
| Density: 79.433%                      |                |           |                |  |
| Routing Overflow: 0.00% H and 0.00% V |                |           |                |  |
| <hr/>                                 |                |           |                |  |

# Pre-CTS Optimization (cont'd)



# Clock Tree Synthesis

- Synthesize clock tree using CCOPT.
  - Perform clock tree synthesis followed by Post-CTS Optimization.
  - Only use CLKBUF1, CLKBUF2, and CLKBUF3 for CCOPT.
- Post-CTS Timing.
  - Some setup and hold violations, can be fixed during another session of Post-CTS optimization.

timeDesign Summary

Setup views included:

| Setup mode       | all    | reg2reg | default |
|------------------|--------|---------|---------|
| WNS (ns):        | -0.002 | -0.002  | 1.490   |
| TNS (ns):        | -0.003 | -0.003  | 0.000   |
| Violating Paths: | 2      | 2       | 0       |
| All Paths:       | 6406   | 6406    | 6406    |

Hold views included:

| Hold mode        | all    | reg2reg | default |
|------------------|--------|---------|---------|
| WNS (ns):        | -0.117 | -0.117  | 0.066   |
| TNS (ns):        | -1.397 | -1.397  | 0.000   |
| Violating Paths: | 73     | 73      | 0       |
| All Paths:       | 6406   | 6406    | 6406    |

| DRVs       | Real           |           | Total          |
|------------|----------------|-----------|----------------|
|            | Nr nets(terms) | Worst Vio | Nr nets(terms) |
| max_cap    | 0 (0)          | 0.000     | 0 (0)          |
| max_tran   | 0 (0)          | 0.000     | 0 (0)          |
| max_fanout | 0 (0)          | 0         | 0 (0)          |
| max_length | 0 (0)          | 0         | 0 (0)          |

Density: 83.184%

Routing Overflow: 0.00% H and 0.00% V

# Clock Tree Synthesis (cont'd) [clock tree route]



# Clock Tree Synthesis (cont'd) [added cell]



# Post-CTS Optimization

- Fixing Setup time and Hold time violation.
  - Density decreases slightly because of Area Reclaim.

```
-----  
timeDesign Summary  
-----  
  
Setup views included:  
 default  
+-----+-----+-----+-----+  
| Setup mode | all | reg2reg | default |  
+-----+-----+-----+-----+  
| WNS (ns) :| 0.040 | 0.040 | 1.474 |  
| TNS (ns) :| 0.000 | 0.000 | 0.000 |  
| Violating Paths:| 0 | 0 | 0 |  
| All Paths:| 6406 | 6406 | 6406 |  
+-----+-----+-----+-----+  
  
Hold views included:  
 default  
+-----+-----+-----+-----+  
| Hold mode | all | reg2reg | default |  
+-----+-----+-----+-----+  
| WNS (ns) :| 0.000 | 0.000 | 0.065 |  
| TNS (ns) :| 0.000 | 0.000 | 0.000 |  
| Violating Paths:| 0 | 0 | 0 |  
| All Paths:| 6406 | 6406 | 6406 |  
+-----+-----+-----+-----+  
  
+-----+-----+-----+  
| | Real | Total |  
| DRVs +-----+-----+  
| | Nr nets(terms) | Worst Vio | Nr nets(terms) |  
+-----+-----+-----+  
| max_cap | 0 (0) | 0.000 | 0 (0) |  
| max_tran | 0 (0) | 0.000 | 0 (0) |  
| max_fanout | 0 (0) | 0 | 0 (0) |  
| max_length | 0 (0) | 0 | 0 (0) |  
+-----+-----+-----+  
  
Density: 82.913%  
Routing Overflow: 0.00% H and 0.00% V  
-----
```

# Post-CTS Optimization (cont'd)



# Routing

- Route all the signal nets using NanoRoute.
- First phase of routing is balanced between timing-driven and congestion-driven.
- Second phase of routing is both timing-driven and SI-driven.
- Setup violations and hold violations can be fixed on Post-Route Optimization.

| timeDesign Summary    |                  |           |                |         |
|-----------------------|------------------|-----------|----------------|---------|
| Setup views included: |                  |           |                |         |
| default               | Setup mode       | all       | reg2reg        | default |
|                       | WNS (ns):        | -1.082    | -1.082         | 1.959   |
|                       | TNS (ns):        | -125.865  | -125.865       | 0.000   |
|                       | Violating Paths: | 624       | 624            | 0       |
|                       | All Paths:       | 6406      | 6406           | 6406    |
| Hold views included:  |                  |           |                |         |
| default               | Hold mode        | all       | reg2reg        | default |
|                       | WNS (ns):        | -0.039    | 0.564          | -0.039  |
|                       | TNS (ns):        | -0.084    | 0.000          | -0.084  |
|                       | Violating Paths: | 3         | 0              | 3       |
|                       | All Paths:       | 6406      | 6406           | 6406    |
|                       |                  |           |                |         |
| Real Total            |                  |           |                |         |
| DRVs                  | Nr nets(terms)   | Worst Vio | Nr nets(terms) |         |
| max_cap               | 0 (0)            | 0.000     | 0 (0)          |         |
| max_tran              | 0 (0)            | 0.000     | 0 (0)          |         |
| max_fanout            | 0 (0)            | 0         | 0 (0)          |         |
| max_length            | 0 (0)            | 0         | 0 (0)          |         |

Density: 82.913%  
Total number of glitch violations: 0

# Routing (cont'd)



# Post-Routing Optimization

- Setup Fixing and Hold Fixing.
- Glitch and Signal Integrity Fixing.
- Add Fillers Cell
- Perform Verify DRC, Verify Geometry, and Verify Connectivity.

```

-----+-----+-----+
timeDesign Summary

-----+-----+-----+
Setup views included:
default
-----+-----+-----+
|   Setup mode | all | reg2reg | default |
-----+-----+-----+
|       WNS (ns):| 0.079 | 0.079 | 2.085 |
|       TNS (ns):| 0.000 | 0.000 | 0.000 |
| Violating Paths:| 0 | 0 | 0 |
| All Paths:| 6406 | 6406 | 6406 |
-----+-----+-----+
-----+-----+-----+
Hold views included:
default
-----+-----+-----+
|   Hold mode | all | reg2reg | default |
-----+-----+-----+
|       WNS (ns):| 0.040 | 0.517 | 0.040 |
|       TNS (ns):| 0.000 | 0.000 | 0.000 |
| Violating Paths:| 0 | 0 | 0 |
| All Paths:| 6406 | 6406 | 6406 |
-----+-----+-----+
-----+-----+-----+
|                   |          Real          |      Total      |
|   DRVs           |          +-----+-----+-----|
|                   |          | Nr nets(terms) | Worst Vio | Nr nets(terms) |
-----+-----+-----+
|   max_cap        | 0 (0)    | 0.000     | 0 (0)    |
|   max_tran       | 0 (0)    | 0.000     | 0 (0)    |
|   max_fanout     | 0 (0)    | 0         | 0 (0)    |
|   max_length     | 0 (0)    | 0         | 0 (0)    |
-----+-----+-----+
-----+-----+-----+
Density: 83.524% (88.612% with Fillers)
Total number of glitch violations: 0
-----+

```

# Post-Routing Optimization (cont'd)

- Verify DRC
  - No violation detected.
- Verify Geometry
  - No violation detected.
- Verify Connectivity
  - No violation detected.

```

VERIFY DRC ..... Sub-Area: {553.280 151.840 632.320 227.760} 30 of 66 Thread : 21
VERIFY DRC ..... Thread : 21 finished.
VERIFY DRC ..... Sub-Area: {237.120 151.840 316.160 227.760} 26 of 66 Thread : 14
VERIFY DRC ..... Thread : 14 finished.
VERIFY DRC ..... Sub-Area: {395.200 151.840 474.240 227.760} 28 of 66 Thread : 19
VERIFY DRC ..... Thread : 19 finished.

```

Verification Complete : 0 Viols.

```

VERIFY GEOMETRY ..... SubArea : 22 of 28 Thread : 21
VERIFY GEOMETRY ..... SubArea : 23 of 28 Thread : 22
VERIFY GEOMETRY ..... SubArea : 24 of 28 Thread : 23
VG: elapsed time: 7.00
Begin Summary ...
  Cells      : 0
  SameNet    : 0
  Wiring     : 0
  Antenna    : 0
  Short      : 0
  Overlap    : 0
End Summary

Verification Complete : 0 Viols.  0 Wrngs.

```

```

***** Start: VERIFY CONNECTIVITY *****
Start Time: Fri Dec 13 09:17:32 2019

Design Name: fft_64p_16b_top
Database Units: 2000
Design Boundary: (0.0000, 0.0000) (863.5500, 450.3000)
Error Limit = 1000; Warning Limit = 50
Check all nets
Use 24 pthreads

Begin Summary
  Found no problems or warnings.
End Summary

End Time: Fri Dec 13 09:17:34 2019
Time Elapsed: 0:00:02.0

```

\*\*\*\*\* End: VERIFY CONNECTIVITY \*\*\*\*\*

# Post-Routing Optimization (cont'd)



# Post-Routing Optimization (cont'd)



# Post-Routing Optimization (cont'd) [Timing Map]



# PrimeTime STA

- Use SPF extracted from Innovus.
- Timing MET
- Question
  - Why the slack is different in significant amount compare to Innovus timing report?
  - Did we set the timing library / RC value correctly for DC, PT, and Innovus?

```

fft_8p_firststage/csubtr_9/subt_ksa_16b_instreal/ksa_top_16b_inst0/s2/bc_11/U3/Y (AOI21X1)
                                         0.0359   4.1811 r
fft_8p_firststage/csubtr_9/subt_ksa_16b_instreal/ksa_top_16b_inst0/s2/bc_11/U2/Y (INVX1)
                                         0.0249   4.2061 f
fft_8p_firststage/csubtr_9/subt_ksa_16b_instreal/ksa_top_16b_inst0/s3/bc_11/U3/Y (AOI21X1)
                                         0.0360   4.2420 r
fft_8p_firststage/csubtr_9/subt_ksa_16b_instreal/ksa_top_16b_inst0/s3/bc_11/U1/Y (INVX1)
                                         0.0197   4.2617 f
fft_8p_firststage/csubtr_9/subt_ksa_16b_instreal/ksa_top_16b_inst0/s4/bc_7/U3/Y (AOI21X1)
                                         0.0167   4.2784 r
fft_8p_firststage/csubtr_9/subt_ksa_16b_instreal/ksa_top_16b_inst0/s4/bc_7/U2/Y (INVX1)
                                         0.0198   4.2981 f
fft_8p_firststage/csubtr_9/subt_ksa_16b_instreal/ksa_top_16b_inst0/s5/gc_7/U2/Y (AOI21X1)
                                         0.0167   4.3148 r
fft_8p_firststage/csubtr_9/subt_ksa_16b_instreal/ksa_top_16b_inst0/s5/gc_7/U1/Y (INVX1)
                                         0.0253   4.3401 f
fft_8p_firststage/csubtr_9/subt_ksa_16b_instreal/ksa_top_16b_inst0/s7/U11/Y (XOR2X1)
                                         0.0554 H   4.3955 f
intermediate_circ/dff_hold_sync_high_reset_inst5/mux_2_to_1_inst0/U34/Y (AOI22X1)
                                         0.0604 H   4.4559 r
intermediate_circ/dff_hold_sync_high_reset_inst5/dff_sync_high_reset_inst0/FE_OFC4705_n34/Y (INVX2)
                                         0.0204   4.4763 f
intermediate_circ/dff_hold_sync_high_reset_inst5/dff_sync_high_reset_inst0/U19/Y (AND2X2)
                                         0.0334   4.5098 f
intermediate_circ/dff_hold_sync_high_reset_inst5/dff_sync_high_reset_inst0/Q_reg[31]/D (DFFPOSX1)
                                         0.0000   4.5098 f
data arrival time                                         4.5098

clock pclk (rise edge)                               8.0000   8.0000
clock network delay (ideal)                         0.0000   8.0000
clock reconvergence pessimism                      0.0000   8.0000
intermediate_circ/dff_hold_sync_high_reset_inst5/dff_sync_high_reset_inst0/Q_reg[31]/CLK (DFFPOSX1)
                                         8.0000 r
library setup time                                -0.0653   7.9347
data required time                                 7.9347
-----data required time                            7.9347
data arrival time                                 -4.5098
-----slack (MET)                                    3.4249

```

# Conclusion

# Conclusion

- The FFT/IFFT Circuit module was successfully designed and the functionality was tested.
- Rounding mechanism needs to be improved to reduce rounding error.
- The design is synthesized in FreePDK45 library using Design Compiler and the timing and area analysis was performed. The pre-APR design can run at 200 MHZ clock frequency.
- Automatic Place and Route is performed on the final design where the placement and routing is done using Innovus. The timing violations are fixed, and the design can run at 125 MHZ frequency.
- At 125MHz, each FFT/IFFT operation can be completed at  $0.56\mu s$  which met the requirement for 802.11a standard at maximum of  $4\mu s$ .

# References

1. Maharatna, Koushik, Grass, Eckhard and Jagdhold, Ulrich (2004) A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM. *IEEE Journal of Solid-State Circuit*, 39 (3), 484-493.
2. <https://github.com/jeremytregunna/ksa>

# **Our Project's Github Repository**

<https://github.com/hibagus/64pointFFTProcessor>

☺ ☺ ☺ **Thank You ☺ ☺ ☺**