-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing 6-step FFT algorithm #967
Comments
RFFT (Real FFT)We first start with RFFT algorithm in which we use 6-step FFT algorithm as provided here . We test it by comparing its output against code
Performance (+ Assertion)
This Colab notebook contains the code. [Update] |
Matrix TranspositionIn the RFFT code provided in previous comment, the function CodeCurrently, this is happening via the following code:
However, as discussed in #938, we can use cache-oblivious algorithm (see #965) for matrix transposition as follows:
Performance( + Assertion)
This Colab notebook contains the code. |
fft_blockIn the step 2 and 5 of the 6-step FFT algorithm, we use the following recursive function:
We can think of the following ways to speed this up: (1) Avoid calling
(2) In (1), we have:
An initial (3) The factor For a given Note that
Therefore, in this version, in addition to (2), we can replace the following for-loop
with this code:
(4) we can precompute all the factors and pass the precomputed array as argument to the recursive function. We can use the following code to compute the array:
Performance (+ Assertion)In this Colab notebook, the performance of these four new versions are checked and compared with the baseline. In the 6-step FFT, the recursive function |
transpose + twiddle factorThis is regarding step 3 and 4 of the 6-step algorithm. Originally, we have the following code:
which tranpose the matrix and multiple its element by twiddle factor. The twiddle factor We can avoid calling
We still need to call
Performance ( + Assertion)This Colab Notebook contains the code that shows the performance of these three versions. As observed, the last version mentioned above outperforms the others. |
Function
|
Eight-step functionWe now work on the eight-step function as provided in this comment
We focus on the following part of the code:
Note 1: We can move Note 2: The factor, Performance (+ Assertion)This Colab Notebook contains the code. The following result is obtained: As observed, the version 1 and 2 show close performance. The code in version 2 is more complicated though. So, we go with version 1.
|
Put pieces together...It is now the time to put the pieces together, and compare our so-far-optimized RFFT with the one provided in this comment. Code is available in this Colab notebook. The running time of RFFT is recorded for input with length 2^2...2^20. For each length, the function is called 5000 times. Since the running time is small, small deviation may result in considerable difference when calculating the speed-up ratio (w.r.t the running time of
and, we can calculate the mean-based speed-up as follows:
Not sure if this approach is science-backed. Still, it can give us some idea about the range. In Colab, I got the following result (lower is better): |
[Recap] [Now] We are now getting closer to Scipy's performance! The code is available in this colab notebook and it has assertion to make sure the output is correct. |
This issue is to optimize the 6-step FFT algorithm, as discussed initially in #938. We will try to improve the performance of each block of the algorithm. Each result MUST come with an end-to-end code, and the code MUST have assertion to make sure the output is correct.
The text was updated successfully, but these errors were encountered: