Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PATIENT and EXHAUSTIVE plans v slow compared to MEASURE #239

Open
AshtonSBradley opened this issue May 13, 2022 · 1 comment
Open

PATIENT and EXHAUSTIVE plans v slow compared to MEASURE #239

AshtonSBradley opened this issue May 13, 2022 · 1 comment

Comments

@AshtonSBradley
Copy link

AshtonSBradley commented May 13, 2022

These timings are surprising to me.

It seems that MEASURE is using the fastest plan, but the other flags are not. The others don't know about threads, judging by cpu usage.

Is this expected, or am I not using this correctly?

Also PATIENT and EXHAUSTIVE are not saving on allocations?

using FFTW, BenchmarkTools
N = 512
A = randn(ComplexF64,N,N)
B = copy(A)

## measure
FFTW.forget_wisdom()
FFTW.set_num_threads(8)

P = plan_fft(A,flags=FFTW.MEASURE);
@btime $P*$A;
  1.409 ms (138 allocations: 4.01 MiB)

P! = plan_fft!(A,flags=FFTW.MEASURE);
@btime $P!*$B setup=(B .= A);
  1.401 ms (137 allocations: 9.56 KiB)
## patient
FFTW.forget_wisdom()
FFTW.set_num_threads(8)

P = plan_fft(A,flags=FFTW.PATIENT);
@btime $P*$A;
  458.462 ms (113592 allocations: 11.24 MiB)

P! = plan_fft!(A,flags=FFTW.PATIENT);
@btime $P!*$B setup=(B .= A);
  914.090 ms (226997 allocations: 14.46 MiB)
## exhaustive
FFTW.forget_wisdom()
FFTW.set_num_threads(8)

P = plan_fft(A,flags=FFTW.EXHAUSTIVE);
@btime $P*$A;
500.417 ms (124745 allocations: 11.94 MiB)

P! = plan_fft!(A,flags=FFTW.EXHAUSTIVE);
@btime $P!*$B setup=(B .= A);
  919.095 ms (227010 allocations: 14.46 MiB)

This is on

julia> versioninfo()
Julia Version 1.8.0-beta3
Commit 3e092a2521 (2022-03-29 15:42 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_PKG_DEVDIR = /Users/abradley/Dropbox/Julia/Dev
  JULIA_NUM_THREADS = 8
@hsgg
Copy link

hsgg commented Oct 18, 2023

Yep, I see the same thing, but only on Apple M1.
Also to note, FFTW.ESTIMATE results in a ~30% faster transform than FFTW.MEASURE.
I see this for 512^3 boxes of in-place transforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants