

# E0 217: Efficient and Secure Digital Circuits and Systems

## Project

Utsav Banerjee

Indian Institute of Science

Aug 2023



# Project Objectives

- Understand design trade-offs
- Optimize a circuit from gate level to architecture level
- Appreciate the economics of electronics industry



# Project Objectives

---

## Project Tasks:

- Design a digital circuit for Fast Fourier Transform (FFT) in Verilog
- Create an appropriate test bench in Verilog to test the design
- Simulate the design using Icarus Verilog and visualize the simulated VCD waveforms using GTKWave to verify functionality
- Synthesize the design using Yosys with the Nangate Open Cell 45nm standard cell library in the typical TT corner
- Analyze timing and power of the implementation using OpenSTA

# Project Overview

## FFT Specifications:

- Input  $x = \{x_0, x_1, \dots, x_7\} \rightarrow 8$  bits
- Output  $X = \{X_0, X_1, \dots, X_7\} \rightarrow 8$  bits.
- Output  $X$  is the 8-point FFT of input  $x$
- Output related to input as follows:

$$X_k = \sum_{m=0}^7 x_m e^{-i\left(\frac{2\pi km}{8}\right)} \text{ for } k = 0, 1, \dots, 7$$

- Reference: [https://en.wikipedia.org/wiki/Fast\\_Fourier\\_transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform)

FFT  $\Rightarrow$  to quickly compute DFT. Implementing DFT from its formula is too slow.

$O(n^2)$   $\xrightarrow{\text{(DFT formula)}}$   $O(n \log n)$  (FFT)

# Project Overview

## FFT Specifications:

Each  $x_m$  has real + imag components  
Real part is 16 bits; Imag part is 16 bits

- Each input element  $x_m$  (for  $m = 0, 1, \dots, 7$ ) is a complex number with real and imaginary parts each represented as a signed 16-bit fixed-point quantity with 1 sign bit, 7 bits for decimal part and 8 bits for fractional part, as shown below:



# Project Overview

## FFT Specifications:

(Same as F/FP specifications)

- Each output element  $X_k$  (for  $k = 0, 1, \dots, 7$ ) is a complex number with real and imaginary parts each represented as a signed 16-bit fixed-point quantity with 1 sign bit, 7 bits for decimal part and 8 bits for fractional part, as shown below:



# Project Overview

## Block Diagram:



# Project Overview

## Design Specifications:

| Signal          | Type  | Description                    |
|-----------------|-------|--------------------------------|
| in0_real [15:0] | Input | 16-bit real part of $x_0$      |
| in0_imag [15:0] | Input | 16-bit imaginary part of $x_0$ |
| in1_real [15:0] | Input | 16-bit real part of $x_1$      |
| in1_imag [15:0] | Input | 16-bit imaginary part of $x_1$ |
| in2_real [15:0] | Input | 16-bit real part of $x_2$      |
| in2_imag [15:0] | Input | 16-bit imaginary part of $x_2$ |
| in3_real [15:0] | Input | 16-bit real part of $x_3$      |
| in3_imag [15:0] | Input | 16-bit imaginary part of $x_3$ |

# Project Overview

## Design Specifications:

| Signal          | Type  | Description                    |
|-----------------|-------|--------------------------------|
| in4_real [15:0] | Input | 16-bit real part of $x_4$      |
| in4_imag [15:0] | Input | 16-bit imaginary part of $x_4$ |
| in5_real [15:0] | Input | 16-bit real part of $x_5$      |
| in5_imag [15:0] | Input | 16-bit imaginary part of $x_5$ |
| in6_real [15:0] | Input | 16-bit real part of $x_6$      |
| in6_imag [15:0] | Input | 16-bit imaginary part of $x_6$ |
| in7_real [15:0] | Input | 16-bit real part of $x_7$      |
| in7_imag [15:0] | Input | 16-bit imaginary part of $x_7$ |

# Project Overview

## Design Specifications:

| Signal           | Type   | Description                    |
|------------------|--------|--------------------------------|
| out0_real [15:0] | Output | 16-bit real part of $X_0$      |
| out0_imag [15:0] | Output | 16-bit imaginary part of $X_0$ |
| out1_real [15:0] | Output | 16-bit real part of $X_1$      |
| out1_imag [15:0] | Output | 16-bit imaginary part of $X_1$ |
| out2_real [15:0] | Output | 16-bit real part of $X_2$      |
| out2_imag [15:0] | Output | 16-bit imaginary part of $X_2$ |
| out3_real [15:0] | Output | 16-bit real part of $X_3$      |
| out3_imag [15:0] | Output | 16-bit imaginary part of $X_3$ |

# Project Overview

## Design Specifications:

| Signal           | Type   | Description                    |
|------------------|--------|--------------------------------|
| out4_real [15:0] | Output | 16-bit real part of $X_4$      |
| out4_imag [15:0] | Output | 16-bit imaginary part of $X_4$ |
| out5_real [15:0] | Output | 16-bit real part of $X_5$      |
| out5_imag [15:0] | Output | 16-bit imaginary part of $X_5$ |
| out6_real [15:0] | Output | 16-bit real part of $X_6$      |
| out6_imag [15:0] | Output | 16-bit imaginary part of $X_6$ |
| out7_real [15:0] | Output | 16-bit real part of $X_7$      |
| out7_imag [15:0] | Output | 16-bit imaginary part of $X_7$ |

# Project Overview

## Design Specifications:

| Signal | Type   | Description                  |
|--------|--------|------------------------------|
| CLK    | Input  | Clock signal                 |
| RST_N  | Input  | Active-low reset signal      |
| write  | Input  | Input write signal           |
| start  | Input  | FFT computation start signal |
| ready  | Output | FFT computation done signal  |

RST\_N is active low signal

→ When RST\_N is low  $\Rightarrow$



Verify this

# Project Overview

## Timing Diagram:



# Project Overview

---

## Functional Verification:

- Must use the following in the test bench to verify functionality
- Inputs:  
 $x_0 = 0.0 + i0.0, x_1 = 1.0 + i0.0, x_2 = 2.0 + i0.0, x_3 = 3.0 + i0.0,$   
 $x_4 = 4.0 + i0.0, x_5 = 5.0 + i0.0, x_6 = 6.0 + i0.0, x_7 = 7.0 + i0.0$
- Calculate the expected outputs  $X'_0, X'_1, \dots, X'_7$  corresponding to 8-point FFT of  $x_0, x_1, \dots, x_7$  using MATLAB or Python

# Project Overview

---

## Functional Verification:

### Expected Outputs:

$$X'_0 = 28.0 + i0.0, X'_1 = -4.0 + i4.0(\sqrt{2} + 1),$$

$$X'_2 = -4.0 + i4.0, X'_3 = -4.0 + i4.0(\sqrt{2} - 1),$$

$$X'_4 = -4.0 + i0.0, X'_5 = -4.0 - i4.0(\sqrt{2} - 1),$$

$$X'_6 = -4.0 - i4.0, X'_7 = -4.0 - i4.0(\sqrt{2} + 1)$$

# Project Overview

---

## Functional Verification:

- Obtain the FFT circuit simulated outputs  $X_0, X_1, \dots, X_7$
- Calculate the maximum absolute error by comparing the real and imaginary parts of each element of the simulated outputs and the expected outputs as

$$\max( |Re(X'_0) - Re(X_0)|, |Im(X'_0) - Im(X_0)|, |Re(X'_1) - Re(X_1)|, \\ |Im(X'_1) - Im(X_1)|, \dots, |Re(X'_7) - Re(X_7)|, |Im(X'_7) - Im(X_7)|, )$$

# Project Overview

---

## Functional Verification:

- Although verified with one input/output combination (test vector), circuit must be functional for all possible input/output combinations
- Feel free to simulate with additional random test vectors
- Note the number of clock cycles required by the circuit to complete 8-point FFT computation, that is, from rising edge of “start” signal to rising edge of “ready” signal

# Project Overview

---

## Implementation Aspects:

- Inputs  $x_0, x_1, \dots, x_7$  stored in input register ( $8 \times 2 \times 16 = 256$  bits) and outputs  $X_0, X_1, \dots, X_7$  stored in output register ( $8 \times 2 \times 16 = 256$  bits)
- Contains both combinational and sequential logic
- Arithmetic circuits are very important components in the design
- Need not be restricted to what has been taught in lectures
- Feel free to refer to books, papers and online resources
- Must cite all references in final project report

# Project Overview

---

## Implementation Metrics:

- Number of clock cycles per FFT (from Icarus Verilog & GTKWave)
- Area of synthesized design (from Yosys)
- Maximum clock frequency (from OpenSTA)
- Total power consumption (from OpenSTA)
- Energy consumption per FFT
- Accuracy of FFT computation

# Project Overview

## Implementation Aspects:

□ Optimize the implementation for one of the following:

- low power
- low latency
- high throughput
- low area
- low energy
- high accuracy



# Project Overview

---

## Implementation Aspects:

- Implementation must be functionally correct with small error
- Explore different data flows (**butterfly operations in FFT**)
- Analyze architectures for signed fixed-point arithmetic
- Implement arithmetic with complex numbers
- Tune precision of twiddle factors
- Understand design trade-offs
- ...

# Project Report

---

- One implementation and one final report per group
- Both team members must submit **same report separately in Teams**
- Clearly document the following in report (format to be shared later):
  - architecture details with block diagrams
  - screenshots of simulation and synthesis results
  - Verilog descriptions for design and test bench
  - contributions of each team member

# Project Grading

---

- Project has 20% weightage in course
- Total 20 points
  - 5 points for implementing / simulating circuit and preparing report
  - 5 points for explaining design and providing Verilog descriptions
  - 5 points for calculating implementation metrics
  - 5 points for design trade-offs
- Implementations with very similar design and/or metrics will receive less score ⇒ do not copy each other's designs!

$\rightarrow \text{RST-N} = 0 \Rightarrow \text{O/P} = 0 ?$

$\rightarrow$  Can we assume write signal is high  
only before the start signal and not  
between rising edge of start and rising  
edge of ready signal?

$\rightarrow$  Look up where to implement clock signal  
(use diagram in page 7 as reference).

# FFT (fast fourier transform)

→ faster way to implement DFT.  
 Here we must compute only 8 point DFT  
 and not inverse DFT.



→ The base of all  $x$ 's has multiple -1.

Twiddle factors:

$\hookrightarrow \bar{x} - \bar{x}$

$\hookrightarrow$  present at the base of every 'x' of each stage.  
present at the base of every 'x' of each stage.

Twiddle factor of stage 1:

$$\hookrightarrow w_N^0 = w_8^0 = \left(e^{-\frac{-j2\pi}{8}}\right)^0 = 1$$

Twiddle factor of stage 2

$$w_8^0 = \left(e^{-\frac{-j2\pi}{N}}\right)^0 = 1$$

$$w_8^2 = \left(e^{-\frac{-j2\pi}{N}}\right)^2$$

$$\begin{aligned} \hookrightarrow w_8^2 &= \left(e^{-\frac{-j2\pi}{8}}\right)^2 = e^{-\frac{-j4\pi}{8}} = e^{-\frac{j\pi}{2}} = 0 - j = -j \\ \hookrightarrow \text{Twiddle factors of stage 2: } &0, -j \end{aligned}$$

Twiddle factors of stage 3:

$$\hookrightarrow w_8^0 = 1$$

$$\hookrightarrow w_8^1 = \frac{1}{\sqrt{2}} - \frac{j}{\sqrt{2}}$$

$$\hookrightarrow w_8^2 = -j$$

$$\hookrightarrow w_8^3 = -\frac{1}{\sqrt{2}} - \frac{j}{\sqrt{2}}$$

Let  $g \rightarrow O/P$  of first stage  
 $h \rightarrow O/P$  of second stage.

O/P of stage 1:

$$g(0) = n(0) + n(4)$$

$$g(1) = n(0) - n(4)$$

$$g(2) = n(2) + n(6)$$

$$g(3) = n(2) - n(6)$$

$$g(4) = n(1) + n(5)$$

$$g(5) = n(1) - n(5)$$

$$g(6) = n(3) + n(7)$$

$$g(7) = n(3) - n(7)$$

Op of stage 2 :

$$h(0) = g(0) + w_8^0 g(2) = g(0) + g(2)$$

$$h(1) = g(1) + w_8^1 g(3) = g(1) - j g(3)$$

$$h(2) = g(0) - w_8^2 g(3) = g(0) - g(3)$$

$$h(3) = g(1) - w_8^3 g(3) = g(1) - (-j) g(3)$$

$$h(4) = g(4) + w_8^0 g(6) = g(4) + g(6)$$

$$h(5) = g(5) + w_8^1 g(7) = g(5) - j g(7)$$

$$h(6) = g(4) - w_8^2 g(6) = g(4) - g(6)$$

$$h(7) = g(5) - w_8^3 g(7) = g(5) - (-j) g(7) = g(5) + j g(7)$$

Op of stage 3 :

$$x(0) = h(0) + w_8^0 h(4) = h(0) + h(4)$$

$$= (g(0) + g(2)) + (g(4) + g(6))$$

$$= n(0) + n(4) + n(2) + n(6) + n(1) + n(5) + n(3) + n(7)$$

$$x(0) = n(0) + n(1) + n(2) + n(3) + n(4) + n(5) + n(6) + n(7)$$

Check for  $x(5)$  :

$$\begin{aligned} x(5) &= w_1 - w_8 w(5) = w_1 - \left(\frac{(1-j)}{\sqrt{2}}\right) w(5) \\ &= \left[ g_1 - j g_3 \right] - \left(\frac{(1-j)}{\sqrt{2}}\right) \left( g_5 - j g_7 \right) \\ &= \left[ n_0 - n_4 - j(n_2 - n_6) \right] - \\ &\quad \left(\frac{(1-j)}{\sqrt{2}}\right) \left[ n_1 - n_5 - j(n_3 - n_7) \right] \end{aligned}$$

$$\begin{aligned} x(5) &= n_0 - n_4 - j n_2 + j n_6 - \left(\frac{(1-j)}{\sqrt{2}}\right) n_1 \\ &\quad + \left(\frac{(1-j)}{\sqrt{2}}\right) n_5 + \left(\frac{(1-j)}{\sqrt{2}}\right) j n_3 - \left(\frac{(1-j)}{\sqrt{2}}\right) j n_7 \end{aligned}$$

$$\begin{aligned} x(5) &= n_0 - \left(\frac{(1-j)}{\sqrt{2}}\right) n_1 - j n_2 + j \left(\frac{(1-j)}{\sqrt{2}}\right) n_3 - n_4 \\ &\quad + \left(\frac{(1-j)}{\sqrt{2}}\right) n_5 + j n_6 - \left(\frac{(1-j)}{\sqrt{2}}\right) j n_7 \end{aligned} \rightarrow ①$$

Verifying the above result with formula result

$$x(k) = \sum_{n=0}^{N-1} n[n] e^{\frac{-j2\pi nk}{N}} ; k: 0, 1, 2, \dots, N-1$$

$$\frac{5\pi}{4} = 225^\circ$$

$$x(5) = \sum_{n=0}^7 x[n] e^{-j\frac{2\pi}{8}n5} ; N=8$$

$$= \sum_{n=0}^7 x[n] \left( e^{-j\frac{5\pi}{4}} \right)^n ; N=8$$

$$x(5) = x(0) + x(1) e^{-j\frac{5\pi}{4}} + x(2) e^{-j\frac{10\pi}{4}} + x(3) e^{-j\frac{15\pi}{4}} + \\ x(4) e^{-j\frac{20\pi}{4}} + x(5) e^{-j\frac{25\pi}{4}} + x(6) e^{-j\frac{30\pi}{4}} + x(7) e^{-j\frac{35\pi}{4}}$$

$$= x(0) + x(1) \left( -0.707 + j0.707 \right) + x(2) \left( 0 - j \right) + \\ x(3) \left( 0.707 + j0.707 \right) + x(4) \left( -1 + 0 \right) + x(5) \left( 0.707 - j0.707 \right) + \\ x(6) \left( 0 + j \right) + x(7) \left( -0.707 - j0.707 \right)$$

$$x(5) = x(0) - x(1) \left( \frac{1-j}{\sqrt{2}} \right) - jx(2) + jx(3) \left( \frac{1-j}{\sqrt{2}} \right) - x(4) \\ + x(5) \left( \frac{1+j}{\sqrt{2}} \right) + jx(6) - jx(7) \left( \frac{1+j}{\sqrt{2}} \right) \rightarrow ②$$

① = ②  $\Rightarrow$  Thus proved.