[Question] How to efficiently find long period signals in data with long gaps? #88

jpdeleon · 2021-01-14T02:25:16Z

Can you suggest a strategy how to efficiently find transits for light curves with long gap e.g. 1 year as in the case of EPICs observed in K2 Campaign 16 and again in C18? I may perhaps limit the period search space using period_min and period_max but I wonder if you have a better solution.

The text was updated successfully, but these errors were encountered:

hippke · 2021-01-14T19:39:22Z

Do you have a good example to test? Code + data

martindevora · 2021-03-16T10:42:13Z

Hi. I was coming here to ask something similar. The point is that grid.py uses time_span to determine the sampling of the period_grid. When long time gaps are present (e.g. TICs with sector 1 and 27), it'd seem that the period_grid becomes heavily dense (like 400,00 data points) and the computation effort increases significantly.

Do you think that it would make sense to calculate the time_span in a different way? I know that we can modify the period_grid density by reducing the oversampling, but the minimum value for this input is 1, which could also be insufficient to prevent the computational cost increment.

martindevora · 2021-03-16T10:46:36Z

Can you suggest a strategy how to efficiently find transits for light curves with long gap e.g. 1 year as in the case of EPICs observed in K2 Campaign 16 and again in C18? I may perhaps limit the period search space using period_min and period_max but I wonder if you have a better solution.

Just as a suggestion and maybe @hippke has a better idea or can refute what I'll suggest. Maybe you could set the oversampling_factor parameter to 1 (which defaults to 3 in tls_constants.py). That would reduce the density of your period grid and make your computations lighter.

hippke · 2021-03-16T18:42:28Z

Ah, I see. At the time I tested TLS, I used a lot of Kepler and early TESS data with only minor gaps. So that's a use case I had not considered. Unfortunately, there is an inevitable computational burden when you go to longer time baseline (for a corresponding gain in sensitivity to longer period planets). So there's no easy solution, i.e. no free lunch. Making TLS run faster will make it less sensitve. However, that may still be OK or even desired, depending on the use case.
Can you provide a full working example that I can check out myself?

martindevora · 2021-03-16T18:56:46Z

I think that it would be feasible without much effort to add the option to the user to provide the period grid. What do you think @hippke ? If not provided then TLS would use the Renault strategy used nowadays.

hippke · 2021-03-16T18:58:42Z

Sure, but what's it good for? The user can already request a grid with any [from, to] in any log-spacing they want?

martindevora · 2021-03-16T19:12:27Z

Do you mean by using the oversampling factor? I thought that the lower limit for that value was 1 because numpy gave me some error where it said that only integers were allowed. I will try to reproduce that situacion, but if true, the grid density would be restricted

hippke · 2021-03-16T19:16:46Z

Yes, you should be able to use any floating point value for oversampling_factor, e.g. 0.5

martindevora · 2021-03-16T20:01:28Z

Do you mean by using the oversampling factor? I thought that the lower limit for that value was 1 because numpy gave me some error where it said that only integers were allowed. I will try to reproduce that situacion, but if true, the grid density would be restricted

I'm unable to reproduce it. Therefore I agree with you that p_min, p_max and oversampling (given that it can be lower than 1) are enough for us the users to tune the period grid. I hope that it suffices the OP needs.

martindevora · 2021-03-16T20:29:28Z

Hi. I reproduced it. See attached error traceback for oversampling = 3.75 as example:

arrays used as indices must be of integer (or boolean) type
Traceback (most recent call last):
    results = model.power(**power_args)
  File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/main.py", line 263, in power
    SR, power_raw, power, SDE_raw, SDE = spectra(chi2, self.oversampling_factor)
  File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/stats.py", line 119, in spectra
    my_median = running_median(power_raw, kernel)
  File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/helpers.py", line 96, in running_median
    med = numpy.median(data[idx], axis=1)
IndexError: arrays used as indices must be of integer (or boolean) type
arrays used as indices must be of integer (or boolean) type

Process finished with exit code 0

With this in mind, we cannot go down oversampling_factor = 1 and we'd be restricted.

martindevora · 2021-03-18T21:07:10Z

Do you think that the error found above is a bug or is it expected?

hippke · 2021-03-20T10:19:49Z

Looks like a bug. Can you share the code to reproduce? I will look into it and make a bugfix if possible.

martindevora · 2021-03-20T14:08:58Z

Sure. Here it is for transitleastsquares version 1.0.25 and using lightkurve v2:

import lightkurve
import transitleastsquares as tls
import multiprocessing

lcf = lightkurve.search_lightcurve("TIC 305048087", mission="TESS", cadence="short", sector=[2], author="SPOC")\
    .download_all()
lc = lcf.stitch().remove_nans()
model = tls.transitleastsquares(lc.time.value, lc.flux.value)
results = model.power(period_min=0.5, period_max=20, use_threads=multiprocessing.cpu_count(), oversampling_factor=3.5)

And the entire output:

Transit Least Squares TLS 1.0.25 (04 June 2020)
Creating model cache for 38 durations
/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/transit.py:157: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  lc_arr = numpy.array(lc_arr)
  0%|          | 0/2919 periods | 00:00<?Searching 18317 data points, 2919 periods from 0.601 to 13.702 days
Using all 8 CPU threads
100%|██████████| 2919/2919 periods | 01:14<00:00
Traceback (most recent call last):
  File "/home/martin/git_repositories/sherlockpipe/test.py", line 9, in <module>
    results = model.power(period_min=0.5, period_max=20, use_threads=multiprocessing.cpu_count(), oversampling_factor=3.5)
  File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/main.py", line 263, in power
    SR, power_raw, power, SDE_raw, SDE = spectra(chi2, self.oversampling_factor)
  File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/stats.py", line 119, in spectra
    my_median = running_median(power_raw, kernel)
  File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/helpers.py", line 96, in running_median
    med = numpy.median(data[idx], axis=1)
IndexError: arrays used as indices must be of integer (or boolean) type

Kind regards.

hippke · 2021-03-20T19:54:33Z

Fixed in version 1.0.26, which is also available via pip.
oversampling_factor now accepts any floating point value.

martindevora · 2021-03-20T21:40:11Z

Great! I would sayas that we can explore long data gaps with the same computacional cost than one after another sectors. To me this should be clear enough and answers the question. However I am not the OP.

martindevora · 2021-03-21T16:08:34Z

Hi. I've already tried the new version. Sorry for bringing this up again. It works sometimes but I found a case where it fails with a new error. My code is:

import lightkurve
import transitleastsquares as tls
import multiprocessing

lcf = lightkurve.search_lightcurve("TIC 352315023", mission="TESS", cadence="short", sector=[13, 27], author="SPOC")\
    .download_all()
lc = lcf.stitch().remove_nans()
lc = lc.remove_outliers(sigma_lower=float('inf'), sigma_upper=3)
model = tls.transitleastsquares(lc.time.value, lc.flux.value)
results = model.power(period_min=0.45, period_max=40, use_threads=multiprocessing.cpu_count() - 1,
                      oversampling_factor=1.1119355997446583, T0_fit_margin=0.05, duration_grid_step=1.1)

And the error:

Transit Least Squares TLS 1.0.26 (20 March 2021)
Creating model cache for 45 durations
/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/transit.py:157: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  lc_arr = numpy.array(lc_arr)
  0%|          | 0/15929 periods | 00:00<?Searching 33168 data points, 15929 periods from 0.602 to 39.985 days
Using all 8 CPU threads
100%|██████████| 15929/15929 periods | 13:26<00:00
Traceback (most recent call last):
  File "/home/martin/git_repositories/sherlockpipe/test.py", line 10, in <module>
    results = model.power(period_min=0.45, period_max=40, use_threads=multiprocessing.cpu_count(),
  File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/main.py", line 263, in power
    SR, power_raw, power, SDE_raw, SDE = spectra(chi2, self.oversampling_factor)
  File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/stats.py", line 119, in spectra
    my_median = running_median(power_raw, kernel)
  File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/helpers.py", line 97, in running_median
    med = numpy.median(data[idx], axis=1)
IndexError: index 15929 is out of bounds for axis 0 with size 15929

Process finished with exit code 1

martindevora · 2021-03-25T17:17:30Z

Hi. Sorry for bothering again, just wanted to keep the issue alive so we can get a fix as soon as you find it possible.
Kind regards.

hippke · 2021-03-25T18:30:17Z

Sorry you are having trouble. Looking into it right now.

The runtime of your script is ~1 minute on my Core-i7 7700k using 7 threads (and considerable background load). Your runtime shows as 13 minutes. Are you using a slow machine, or is there something else going on?

Why not use oversampling_factor=1?

I have made a change to the code which fixes the issue, but raises some nasty warnings. Before I release a new version, can you test the new fix? You need to replace the file helpers.py in your folder /home/martin/.local/lib/python3.8/site-packages/transitleastsquares/ with the version attached.
helpers.zip

martindevora · 2021-03-25T18:41:30Z

Hi. Because oversampling_factor=1 is also still suboptimal and the period_grid is still denser than desired. The test I was doing was just using an automatically calculated oversampling factor. This value will probably be lower than 1 for other targets (or whenever we change the equation calculating the value) and therefore I would like to stick to float values instead of integers.
The runtime I reported was very long probably because I stopped the code with a breakpoint for a while and was debugging a little bit.
I will try your suggested file and come back with the output.

martindevora · 2021-03-25T19:15:57Z

Hi, here are the results (it seems that the execution worked fine a.k.a. no errors):

Transit Least Squares TLS 1.0.26 (20 March 2021)
Creating model cache for 45 durations
/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/transit.py:157: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  lc_arr = numpy.array(lc_arr)
  0%|          | 0/15929 periods | 00:00<?Searching 33168 data points, 15929 periods from 0.602 to 39.985 days
Using all 8 CPU threads
100%|██████████| 15929/15929 periods | 03:49<00:00
/home/martin/.local/lib/python3.8/site-packages/numpy/core/_methods.py:261: RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/martin/.local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
  arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/home/martin/.local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/main.py:404: UserWarning: 292 of 329 transits without data. The true period may be twice the given period.
  warnings.warn(text)

Those ugly warnings were already appearing in a fresh installation that I was running from another project which also uses TLS. I don't know what is the cause of them but they probably need a fix too. Thanks for being working on this! And please keep me posted whenever you consider it is fixed.

jpdeleon added the bug Something isn't working label Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to efficiently find long period signals in data with long gaps? #88

[Question] How to efficiently find long period signals in data with long gaps? #88

jpdeleon commented Jan 14, 2021

hippke commented Jan 14, 2021

martindevora commented Mar 16, 2021 •

edited

martindevora commented Mar 16, 2021

hippke commented Mar 16, 2021

martindevora commented Mar 16, 2021 •

edited

hippke commented Mar 16, 2021

martindevora commented Mar 16, 2021

hippke commented Mar 16, 2021 •

edited

martindevora commented Mar 16, 2021

martindevora commented Mar 16, 2021 •

edited

martindevora commented Mar 18, 2021 •

edited

hippke commented Mar 20, 2021

martindevora commented Mar 20, 2021 •

edited

hippke commented Mar 20, 2021

martindevora commented Mar 20, 2021

martindevora commented Mar 21, 2021 •

edited

martindevora commented Mar 25, 2021

hippke commented Mar 25, 2021

martindevora commented Mar 25, 2021 •

edited

martindevora commented Mar 25, 2021

[Question] How to efficiently find long period signals in data with long gaps? #88

[Question] How to efficiently find long period signals in data with long gaps? #88

Comments

jpdeleon commented Jan 14, 2021

hippke commented Jan 14, 2021

martindevora commented Mar 16, 2021 • edited

martindevora commented Mar 16, 2021

hippke commented Mar 16, 2021

martindevora commented Mar 16, 2021 • edited

hippke commented Mar 16, 2021

martindevora commented Mar 16, 2021

hippke commented Mar 16, 2021 • edited

martindevora commented Mar 16, 2021

martindevora commented Mar 16, 2021 • edited

martindevora commented Mar 18, 2021 • edited

hippke commented Mar 20, 2021

martindevora commented Mar 20, 2021 • edited

hippke commented Mar 20, 2021

martindevora commented Mar 20, 2021

martindevora commented Mar 21, 2021 • edited

martindevora commented Mar 25, 2021

hippke commented Mar 25, 2021

martindevora commented Mar 25, 2021 • edited

martindevora commented Mar 25, 2021

martindevora commented Mar 16, 2021 •

edited

martindevora commented Mar 16, 2021 •

edited

hippke commented Mar 16, 2021 •

edited

martindevora commented Mar 16, 2021 •

edited

martindevora commented Mar 18, 2021 •

edited

martindevora commented Mar 20, 2021 •

edited

martindevora commented Mar 21, 2021 •

edited

martindevora commented Mar 25, 2021 •

edited