Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squeeze extra performance out of mixed-radix algorithm #3

Open
calebzulawski opened this issue Jan 21, 2020 · 3 comments
Open

Squeeze extra performance out of mixed-radix algorithm #3

calebzulawski opened this issue Jan 21, 2020 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@calebzulawski
Copy link
Owner

When profiling with perf, a huge amount of time (40-60% of the entire transform) seems to be spent in the very first "narrow SIMD" pass, where the stride isn't large enough to fill an entire SIMD vector. Right now there is an optimization for radix-4 on AVX, but even with that the performance is underwhelming.

@ejmahler
Copy link
Contributor

Worth pointing out: The "prime factor algorithm" in Fourier isn't mathematically accurate: https://en.wikipedia.org/wiki/Prime-factor_FFT_algorithm

The prime factor algorithm is the name for a specific algorithm where no twiddle factors are necessary, because the inputs are reordered based on some number theory formulas. (Worth looking into btw - it could be faster because you significantly reduce the number of floating point operations, or slower because you have to do re-indexing).

A more accurate name for the algorithm in fourier might be mixed radix or cooley-tukey.

@calebzulawski
Copy link
Owner Author

Right, I knew I had seen that name somewhere. I was looking for a more generic name since the distinction between Cooley-Tukey and Stockham autosort generally doesn't matter, I definitely meant mixed-radix! Thanks

@calebzulawski calebzulawski changed the title Squeeze extra performance out of prime-factor algorithm Squeeze extra performance out of mixed-radix algorithm Jan 22, 2020
@calebzulawski
Copy link
Owner Author

The PFA algorithm does interest me, though I wonder if the effort/complexity is worth it compared to the existing Bluestein's algorithm implementation.

@calebzulawski calebzulawski added enhancement New feature or request help wanted Extra attention is needed labels Aug 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants