Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fbp speed #1505

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Improve fbp speed #1505

wants to merge 4 commits into from

Conversation

adler-j
Copy link
Member

@adler-j adler-j commented Jun 4, 2019

It seems that pyfftw real to complex FFT is not optimised for non-powers of two. This fix increases performance for 1000 angles and 500 pixels from 180ms to 3ms on my laptop.

@adler-j adler-j requested a review from kohr-h June 4, 2019 12:07
Copy link
Member

@kohr-h kohr-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general idea, but the next power of 2 is not necessarily optimal. In particular pyfftw has a utility function next_fast_len that computes the optimal size. The scipy.fftpack version (though not supported by us) has an analogous function. Perhaps Numpy's FFT has the same characteristics, but since it's slower than pyFFTW anyway, it's probably okay to not optimize it.

So I suggest that you use pyfftw.next_fast_len for pyfftw backend, and either do nothing for numpy or guess and use the same.

@adler-j
Copy link
Member Author

adler-j commented Jun 6, 2019

Ok, fix for that coming along. Sadly it seems that the function you mentioned is not 100% valid for e.g. rfft so I need to special case that. This function demonstrates the results anyway with that fix:

import time
import numpy as np
import matplotlib.pyplot as plt
import odl
import pyfftw

class Timer:
    def __init__(self, print_at_exit=True):
        self.print_at_exit = print_at_exit

    def __enter__(self):
        self.start_time = time.time()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.end_time = time.time()
        self.exc_type = exc_type
        if self.print_at_exit:
            print(self)

    def __repr__(self):
        return "<{} duration={}{}>".format(self.__class__.__name__, self.duration, self.exception())

    def exception(self):
        return " exception={}".format(self.exc_type.__name__) if self.exc_type else ""

    @property
    def duration(self):
        try:
            return self.end_time - self.start_time
        except AttributeError:
            return -1


impl = 'numpy'
pad = True

if impl == 'pyfftw':
    fft = pyfftw.interfaces.numpy_fft
elif impl == 'numpy':
    fft = np.fft


nv = np.arange(10, 512)
times = np.zeros_like(nv, dtype='float')

for i, n in enumerate(nv):
    if pad:
        npad = odl.trafos.util.next_fast_len(int(n), impl=impl, rfft=True)
    else:
        npad = n
    arr = np.zeros([npad, npad])
    with Timer() as timer:
        fft.irfft(fft.rfft(arr))
    times[i] = timer.duration

plt.semilogy(nv, times, label=impl + '_rfft_' + str(pad))

times = np.zeros_like(nv, dtype='float')

for i, n in enumerate(nv):
    if pad:
        npad = odl.trafos.util.next_fast_len(int(n), impl=impl, rfft=False)
    else:
        npad = n
    arr = np.zeros([npad, npad])
    with Timer() as timer:
        fft.ifft(fft.fft(arr))
    times[i] = timer.duration

plt.semilogy(nv, times, label=impl + '_fft_' + str(pad))

plt.legend()

image

image

@kohr-h
Copy link
Member

kohr-h commented Jun 7, 2019

Very interesting, thanks for the analysis @adler-j! As it turns out, SciPy's function was first and was ported to pyFFTW later: pyFFTW/pyFFTW#159. I see quite some benchmarking going on there, too, but it's possible that only C2C was covered.

Might be good to make them aware?

Copy link
Member

@kohr-h kohr-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some stuff

@@ -1607,6 +1608,28 @@ def unique(seq):
return unique_values


def nextpow2(n):
"""
Compute the integer which is a power of two.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On first line.
An maybe: "Return the next integer power of 2."
Actually, after the change, it's more like "Return the exponent of the next integer power of 2."

Parameters
----------
n : int

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No text? Not that there's a lot to say, but doc formatting might break.

>>> odl.util.nextpow2(7)
8
>>> odl.util.nextpow2(513)
1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doctests ought to fail now

@@ -1607,6 +1608,28 @@ def unique(seq):
return unique_values


def nextpow2(n):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_int_log2?

@@ -47,6 +47,7 @@
'repr_string',
'attribute_repr_string',
'method_repr_string',
'nextpow2',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although tiny, I'd probably put this into numerics.py, if only to avoid littering this kitchen sink file.

@@ -649,6 +650,19 @@ def reciprocal_space(space, axes=None, halfcomplex=False, shift=True,
return recip_spc


def next_fast_len(n, impl, rfft=False):
"""The smallest size for which a fast fft implementation is available."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFT

@@ -649,6 +650,19 @@ def reciprocal_space(space, axes=None, halfcomplex=False, shift=True,
return recip_spc


def next_fast_len(n, impl, rfft=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe is_rfft

@grlee77
Copy link
Contributor

grlee77 commented Jun 7, 2019

Ok, fix for that coming along. Sadly it seems that the function you mentioned is not 100% valid for e.g. rfft so I need to special case that. This function demonstrates the results anyway with that fix:

Do you mean that it isn't 100% valid in SciPy? In FFTW, I am pretty sure it is equally valid for both fft and rfft functions. I think the only case where it doesn't apply are a subset of the real-to-real transforms (e.g. DCT-I, which are not currently wrapped in pyFFTW).

Here is a figure I generated based on your script but calling

npad = pyfftw.interfaces.scipy_fftpack.next_fast_len(int(n))

instead of

npad = odl.trafos.util.next_fast_len(int(n), impl=impl, rfft=True)

rfft_single_thread_estimate

Also, if you enable caching of plans and increase the planner effort and number of threads, the difference can become larger. Here is a case using 'FFTW_MEASURE' and 10 threads on 10-core SkylakeX CPU. In this case you need to request this near the top of the benchmark script

pyfftw.config.NUM_THREADS = 10
pyfftw.config.PLANNER_EFFORT = 'FFTW_MEASURE'
pyfftw.interfaces.cache.enable()

and make sure to call the transform once before entering the Timer so the cached Plan will be used

        # call once so the FFTW plan will be cached
        fft.irfft(fft.rfft(arr))

        # average time over 10 calls
        nreps = 10
        with Timer() as timer:
            for n in range(nreps):
                fft.irfft(fft.rfft(arr))
        times[i] = timer.duration / nreps

rfft_10threads_measure

@grlee77
Copy link
Contributor

grlee77 commented Jun 7, 2019

Related to this, the following work might eventually remove the need to maintain multiple FFT backends in ODL itself:

We currently have a student working on support for multiple FFT backends for SciPy as part of Google Summer of Code (GSOC). As of now it is only week 2 of 12 for the project, but the goal is to allow use of various backends (pyFFTW, pocketfft, mkl-fft, etc) via a unified API using a proposed scipy.fft module.
see: https://github.com/scipy/scipy/wiki/GSoC-2019-project-ideas
some initial work in: scipy/scipy#10238

@kohr-h
Copy link
Member

kohr-h commented Jun 7, 2019

Many thanks for the analysis @grlee77! Re: the new scipy.fft effort: That would make things so much easier! Looking forward to trying it out and simplifying our code!

@adler-j
Copy link
Member Author

adler-j commented Jun 9, 2019

It should be mentioned that there is already the pyfftw numpy compatibility interfaces: https://hgomersall.github.io/pyFFTW/pyfftw/interfaces/interfaces.html the problem with them was that they do not provide optimal usage in terms of e.g. caching and planning. Is the planned scipy module supposed to solve this somehow?

@kohr-h
Copy link
Member

kohr-h commented Jun 9, 2019

To me it seems like the exposed interface capabilities will depend on the backend. The SciPy interface, for instance, has always had overwrite_x as option, although I don't know how it affects performance. So I guess the final interface will look identical in the first couple of (positional) arguments and add optional args depending on the backend. Ideally with all the caching and in-place options supported.

But @grlee77 should know better.

@kohr-h kohr-h added this to In revision in PR Status Jun 11, 2019
@grlee77
Copy link
Contributor

grlee77 commented Jun 12, 2019

It should be mentioned that there is already the pyfftw numpy compatibility interfaces: https://hgomersall.github.io/pyFFTW/pyfftw/interfaces/interfaces.html the problem with them was that they do not provide optimal usage in terms of e.g. caching and planning.

pyFFTW 0.11 added the ability to configure the default number of threads and planner effort without having to explicitly pass the number of threads and effort to each individual function call (this was used in my example above). When the cache is enabled, I think this should provide most of the performance one would get when explicitly preplanning transforms.

Is the planned scipy module supposed to solve this somehow?

There are still open questions as to what the final form of this will be and whether backend-specific arguments will be supported, so any feedback you have as a potential downstream user would be greatly appreciated. The issue I linked above was mainly for the pocketfft work, but some initial backend discussion has been in scipy/scipy#10175 and some earlier posts on the scipy-dev mailing list. There is not yet an open PR for the backend support (Peter started out with the addition of pypocketfft to SciPy and is just now starting to implement the backends). Peter prepared a design document with initial plans that will be revised as needed going forward. At the moment we are exploring whether or not we should use uarray to implement the backend.

The SciPy interface, for instance, has always had overwrite_x as option, although I don't know how it affects performance.

It can help some, but has also caused some confusion on the user end because setting it to True is not a guarantee that the array will be overwritten (this is only possible for some transform types). So the user should not try to use it in the same way as the out= argument provided by many numpy functions (it is basically like FFTW's FFTW_DESTROY_INPUT flag).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
PR Status
In revision
Development

Successfully merging this pull request may close these issues.

None yet

3 participants