Skip to content

blackwer/fft_bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarking various FFT implementations

FFTW3 run in FFTW_MEASURE mode. FFTW_PATIENT is death, and I wanted to give it a fair shot with the default. It's compiled with

./configure --enable-openmp --enable-shared=yes --enable-sse2 --enable-avx\
    --enable-avx2 --enable-avx512 --disable-kcvi --disable-vsx \
    --disable-avx-128-fma --enable-fma

using gcc 12.2.0. All benchmarks are compiled with native instructions enabled: -march=native -O3.

MKL FFTW KISS Pocket DUCC Sleef
2023.1.0 3.3.10 131.1.0 81d171a6 0.32.0 3.5.1

Single threaded results

1D

2D

3D

Multi threaded results

All benchmarks are run on a single socket, using all available cores on that socket. I.e.

# Rome: 64 cores
# Skylake: 20 cores
# Icelake: 32 cores

export OMP_NUM_THREADS=$((nproc / 2))
export OMP_PROC_BIND=spread
export OMP_PLACES=threads

taskset -c 0-$((OMP_NUM_THREADS-1)) blah_bench args

Note that KISS has openmp enabled, but I didn't do a separate build for it. Given its performance, I am happy to ignore it. So the following benchmarks are just for mkl/fftw3.

1D

2D

Note the AMD measurements are not in error. This really happens consistently. MKL is very unhappy with more than 16 threads for these particular sizes in 2D.

3D

About

More benchmarks of various fft implementations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published