Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace.remove_response() without nfft_pow2 option slow for certain npts #715

Closed
wants to merge 1 commit into from

Conversation

megies
Copy link
Member

@megies megies commented Jan 17, 2014

Dear developers,

when working with large traces (say, nb of data ~ 1,000,000), the rfft inside 'remove_response' method hangs (on my computer at least).

In 'simulate' method there is an option nfft_pow2, which I always use to get away with this.

Is it possible to add this option to remove_response?

Thanks
BG

@megies
Copy link
Member

megies commented Jan 17, 2014

Hmm.. when putting together remove_response() I wanted to omit as much clutter as possible and I played with the nfft option a bit and really did not see it making any notable difference..

I just checked again a bit more thoroughly and for me the nfft option slows down the routine significantly:
screenshot from 2014-01-17 11 00 54

I am not convinced that reintroducing this option would help..
How did you install ObsPy? Is it possible for you to temporarily install obspy from a test branch? pip install https://github.com/megies/obspy/archive/nfftpow2.zip should do the trick (depending on your Python setup/installation). Please run the following in Ipython then and post the output.

from obspy import read
import numpy as np

np.random.seed(123)
tr = read()[0]

tr.data = np.random.random(int(1e3))
%timeit tr.copy().remove_response(nfft_pow2=True)
%timeit tr.copy().remove_response(nfft_pow2=False)

tr.data = np.random.random(int(1e4))
%timeit tr.copy().remove_response(nfft_pow2=True)
%timeit tr.copy().remove_response(nfft_pow2=False)

tr.data = np.random.random(int(1e5))
%timeit tr.copy().remove_response(nfft_pow2=True)
%timeit tr.copy().remove_response(nfft_pow2=False)

tr.data = np.random.random(int(1e6))
%timeit tr.copy().remove_response(nfft_pow2=True)
%timeit tr.copy().remove_response(nfft_pow2=False)

@krischer
Copy link
Member

There are some rare cases where numpy's FFT implementation is extremely slow. There is also some discussion on the numpy mailing list about this but I cannot find it right now, here is a github ticket instead:
numpy/numpy#1864

It depends on the number of samples in the data array fed to numpy.ftt.fft(). In that cases the nfft_pow2 option, while slower in most other cases, is much faster.

It probably depends on the machine and installation, but try the following. Note that the number of samples in both examples only differs by one!

>>> import timeit
>>>timeit.timeit("import numpy as np; np.fft.rfft(np.empty(1037109))", number=1) 
0.7895288467407227

# This one does not even finish. It hangs inside the timeit function.
# Otherwise it finishes in between 3 and 10 minutes on my machine...
>>>timeit.timeit("import numpy as np; np.fft.rfft(np.empty(1037108))", number=1) 

A way to work around this, without relying on the nfft_pow2 option is to monkey patch numpy's fft with the implementation provided by pyfftw:

# Untested!
import pyfftw.interfaces
import numpy.fft

numpy.fft = pyfftw.interfaces.numpy_fft

# Turn on the cache for optimum performance
pyfftw.interfaces.cache.enable()

Concluding I would say that having the nfft_pow2 options might be necessary...

@megies
Copy link
Member

megies commented Jan 17, 2014

Oh my, this is crazy.. no way to know beforehand for which npts this is in issue, I guess? I would really like to have as little low-level options as possible in here.. :/

@megies
Copy link
Member

megies commented Jan 17, 2014

@bgoutorbe, can you find out the exact npts of the traces you had problems with and post them here? It seems the problem might be related to large prime numbers in the factorization of npts. If that is what was causing you problems we could work around the issue internally.

@bgoutorbe
Copy link
Author

this is exactly as krischer said: on both my home and work computers, rfft acts weird with some particular nb of samples:

timeit.timeit("import numpy as np; np.fft.rfft(np.empty(1037109))", number=1)
0.589202880859375
timeit.timeit("import numpy as np; np.fft.rfft(np.empty(1037108))", number=1)
--> does not finish
timeit.timeit("import numpy as np; np.fft.rfft(np.empty(900001))", number=1)
--> does not finish either (and corresponds to the nb of pts in my trace)

...quite crazy :/

So, using megies' implementation:

tr.data = np.random.random(int(1e6))
%timeit tr.copy().remove_response(nfft_pow2=True)
--> 1 loops, best of 3: 18.5 s per loop
%timeit tr.copy().remove_response(nfft_pow2=False)
--> 1 loops, best of 3: 17.5 s per loop

BUT with a cursed number:

tr.data = np.random.random(900001)
%timeit tr.copy().remove_response(nfft_pow2=True)
--> 1 loops, best of 3: 18.4 s per loop
%timeit tr.copy().remove_response(nfft_pow2=False)
--> does not finish at all!

@emolch
Copy link

emolch commented Jan 17, 2014

From my experience it (zero-padding to pow2 length) can safely and always (or by default) be done inside the remove_response method. I have always done it like that for performance reasons in the pyrocko.trace.Trace.transfer function, which probably works similar.

Only sad that this trick does not work with resampling because of the fixed ratio between the lengths of the forward and backward transforms, so only one of the transforms can be made with a pow2 length.

@megies
Copy link
Member

megies commented Jan 17, 2014

I looked into the relation of npts to computation time a bit.. here's what I got:
figure_aa
figure_ab
figure_ac

The baseline of normal non-nextpow2 seems linear to me, but when large primes are involved (in the factorization of npts) then the computation time seems to be quadratic function of npts.
figure_a
figure_b

The factorization is fast so I think I'll include a check for large primes in npts and use next power of 2 internally only if we hit a large prime.

@bgoutorbe
Copy link
Author

@megies, great and extremely interesting work!

in case large primes are encountered in nfft factorization (which makes
the computation take forever) we simply switch to using the next power
of 2, which is not the fastest possibility but has robust computation
time.
@megies
Copy link
Member

megies commented Jan 17, 2014

Oops, this has to go into releases not master..

@megies megies closed this Jan 17, 2014
@megies
Copy link
Member

megies commented Jan 19, 2014

Here's a few plots (2D histograms) to summarize this issue:

A) Original state of Trace.remove_response() and Trace.simulate() (without specifying nfft_pow2=True) before pull request. Some npts that result in nfft points that have large primes in their factorization have very slow calculation time (some quadratic function, not acceptable for traces with large number of samples; does not finish for some bad values)
figure_x1

B) behavior for Trace.simulate(nfft_pow2=True). Failsafe but up two times slower.
figure_x2

C) If encountering a bad nfft value use next power of 2 (like nfft_pow2 does).
figure_x3

D) new behavior for Trace.remove_response() and Trace.simulate() (when not explicitely specifying nfft_pow2=True -- this behavior is left unchanged). If encountering a bad nfft value try some slightly larger values (10 tries), if no good value is encountered use next power of 2 (like nfft_pow2 does).
figure_xxx

megies added a commit that referenced this pull request Jan 19, 2014
@QuLogic QuLogic added this to the 0.9.2 milestone Oct 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants