

# DAT094 Lab 4: Downloading and evaluating on the FPGA

Weihan Gao – weihanga@chalmers.se

October 30, 2022

## 1 Introduction

In this lab, we had two implementations of digital filters for two different times of averages and tested their hardware cost and performance. And tested the implementation with external stimuli on FPGA.

## 2 The system

The top design file MAC\_gen\_min\_full.vhdl has contains 7 components: MAC\_gen\_min\_12\_17, two convert\_data\_format, sample\_clock, SPI\_clock, SPI\_AD, and SPI\_DA with some signal lines. SPI\_AD and the following converter modules can convert the analog signal into digital signed\_data, filtered by the MAC\_gen\_min\_12\_17. After through the converter module again, the filtered signal is converted to analog data by SPI\_DA module. SPI\_clock and sample\_clock can create the clock\_enable signals.

The MAC\_gen\_min\_full module is for the average filter algorithm. It calculates the filtered data using the average from a finite number of input data sampled. The finite number is the number of TAPs, which is decided by GENERIC parameters. The SIGNAL\_WIDTH and COEFF\_WIDTH in GENERIC decide the data line width for calculation – DA\_data/AD\_data between SPI\_DA/SPI\_AD and converter, filtered\_data/signed\_data between converter and MAC\_gen\_min. Therefore, they control the hardware area (more TAPs, more area), critical set-up time constraints (more TAPs, longer time), dynamic power (because wider data lines mean more states in SPI\_FSM so that a greater number of bits to be flipped), and energy per operation.



Figure 1: Block diagram.

In lab 3, we have seen that the SPI\_LDA converts the parallel data into serial data by FSM. SCLK\_enable signal from the block SPI\_CLK decides how much and how long the FSM is. When we want the wordlengths of signal is 17, the necessary number of FSM is 17 (regardless of configuration and others), which means, the valid number of SCLK\_enable is 17 to change the states converting. SCLK\_en and sample\_en are generated from the frequency dividers. SPI\_CLK block, for example, has a counter counting the number of rising edge of system clk, and when it comes to the PERIOD\_constant we set, the SCLK\_en will reverse. So the period of output clock is PERIOD\_constant multiplying the system period.

The highest system clock frequency we found is about 41.7 MHz with a 0.171 ns slack. In lab3, the highest frequency is 51 MHz, higher than 41.7 MHz. It

may be because of some converter combinational blocks on the critical path. The hardware utilization is also different from those in lab 3. It has 70 more LUTs and some DSPs in this design. DSP accounts for 5% of resources near the 6% IO. And the new serial design has same structure as the previous. The highest frequency we found is  $1/9.2 = 10.9$  MHz.

| Setup                                         | Hold                             | Pulse Width                |
|-----------------------------------------------|----------------------------------|----------------------------|
| Worst Negative Slack (WNS): 0.171 ns          | Worst Hold Slack (WHS): 0.109 ns | Worst Pulse Width Slack (  |
| Total Negative Slack (TNS): 0.000 ns          | Total Hold Slack (THS): 0.000 ns | Total Pulse Width Negative |
| Number of Failing Endpoints: 0                | Number of Failing Endpoints: 0   | Number of Failing Endpoi   |
| Total Number of Endpoints: 647                | Total Number of Endpoints: 647   | Total Number of Endpoints  |
| All user specified timing constraints are met |                                  |                            |

Figure 2: Time.

| Name                                      | Slice LUTs<br>(63400) | Slice Registers<br>(126800) | Slice<br>(15850) | LUT as Logic<br>(63400) | DSPs<br>(240) | Banded I/Os<br>(210) | BUFGCTRL<br>(32) |
|-------------------------------------------|-----------------------|-----------------------------|------------------|-------------------------|---------------|----------------------|------------------|
| MAC_gen_min_full_12_17                    | 401                   | 316                         | 182              | 401                     | 13            | 13                   | 1                |
| MAC_gen_min_12_17_inst(MAC_gen_min_12_17) | 329                   | 247                         | 145              | 329                     | 13            | 0                    | 0                |
| simple_clock_inst(simple_clock)           | 11                    | 73                          | 7                | 11                      | 0             | 0                    | 0                |
| SPI_AO_inst(SPI_AO)                       | 23                    | 31                          | 14               | 23                      | 0             | 0                    | 0                |
| SPI_Clock_inst(SPI_clock)                 | 9                     | 9                           | 4                | 9                       | 0             | 0                    | 0                |
| SPI_DA_inst(SPI_DA)                       | 29                    | 46                          | 20               | 29                      | 0             | 0                    | 0                |

Figure 3: Utilization.



Figure 4: Utilization.

### 3 Execture on FPGA

We set the 1 KHz sinewave 1Vpp signal on AD input and used the oscilloscope to monitor this signal on channel 2 and DA output on channel 1. The output amplitude is related to the input signal frequency. We scanned the frequency from zero up and found when it came to 10.86 KHz, it would reduce to about  $660/940 = 1/\sqrt{2}$ . It means this design is a low-pass filter with 10.86 KHz cutoff frequency.



Figure 5: Wave of input and output.

## 4 A second parameter set

The differences in the MAC\_gen\_min are bigger than those in the MAC\_gen\_ser. As the number of TAPs increases, the resource utilization for MAC\_gen\_min is bigger, mainly the DSP resource, increasing from 5% to 23%. LUT is 2780 more than the previous number. And the highest frequency is about  $1/124\text{ns} = 8.1\text{MHz}$  with a 0.8 ns slack.



Figure 6: Utilization.



Figure 7: Time.

Another difference is in the frequency domain. This one is a band-pass filter rather than low-pass filter. Its cutoff frequency is 8.14 KHz and 12.84 KHz.

## 5 Summary

Several filters have several algorithms, performances, hardware utilizations, and signal processing quality. Even if the same algorithm is, the different implementations will result in differences. If we want the smoother FIR filtered signal, we may need more TAPs, but it will change the frequency domain. So we need to make some trade-off also in the insights of signal processing.