Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom process-function options #273

Open
calum-chamberlain opened this issue Aug 30, 2018 · 8 comments
Open

custom process-function options #273

calum-chamberlain opened this issue Aug 30, 2018 · 8 comments

Comments

@calum-chamberlain
Copy link
Member

Is your feature request related to a problem? Please describe.
Gaps in seismic data cause most of the issues with normalisation of correlations. EQcorrscan's pre_processing functions take care of gaps pretty well now, but it would be good to expose users to how gaps are handled so that they can easily write their own custom process functions (e.g. adding processing steps, or using a different type of filtering, decimating rather than resampling...) that also handle gaps in the way the correlation functions expect.

It would also be useful if the match_filter objects allowed a custom process-function to be specified (in a similar way that users can specify any correlation function they want). This would allow more people to use those objects.

Describe the solution you'd like

  1. Refactor gap handling as a context manager;
  2. Provide a new keyword arg for match-filter objects of process_func.

Both would require docs and tutorials to make it clear how they should be used - in general the docs are in real need of a tidy.

  1. pre_processing._fill_gaps and pre_processing._zero_pad_gaps would be repurposed as __enter__ and __exit__ functions on a HandleGaps context manager. The API would end up looking something like this:
from eqcorrscan.pre_processing import HandleGaps

with HandleGaps(tr):
    custom_processing(tr)
  1. Would be fairly simple, add an extra argument, and check when calling the processing functions if it had been set, otherwise, use the inbuilt processing functions.
@d-chambers
Copy link
Collaborator

Hey @calum-chamberlain,

This looks interesting. I love the idea of simply doing the thing most people will want by default, but allowing users to modify the default behaviour when needed. A few thoughts/questions:

  1. Its probably better to have the __enter__ method return a trace/stream rather than assuming it will operate in place. This will allow the logic of the pre-processor to operate in place or not, and then just return the resulting object. So the API could look something like this:
from eqcorrscan.pre_processing import HandleGaps

with HandleGaps(tr) as trr:
    custom_processing(trr)
  1. What would the clean-up of the context manager do? The main strength of the context manager is ensuring the __exit__ method gets called regardless of unhandled exceptions. Is there something you had in mind that needs to happen once the HandleGaps scope exists or could a function call suffice to save a level of indentation?

  2. Users may want to have several pre-processing methods in a particular order. It may be useful to provide a way to chain them together. Something similar to scikit learn's Pipeline maybe?

@calum-chamberlain
Copy link
Member Author

Thanks for those @d-chambers (also thanks for the book recommendation, I'm chewing my way through it, and almost every page has something of great interest!).

At the moment, the function _fill_gaps is run before filtering and resampling, and _zero_pad_gaps is used after processing to fill the gaps found by _fill_gaps with zeros. I was thinking that _fill_gaps would be the equivalent of an __enter__ and _zero_pad_gaps would be used as __exit__. What these functions do is:

  • _fill_gaps finds gaps in data and interpolates over them to enforce a continuous trace for processing, it returns the trace and the gap positions;
  • _zero_pad_gaps takes the processed trace and the gap positions and replaces the values in the gaps positions with zeros to ensure that correlations are zero in the window where there was originally no data.
    Do you think that would work? I was planning on it working in-place...
  1. That pipline idea looks interesting, not sure how I would implement it, but could be something fun in the future.

@d-chambers
Copy link
Collaborator

No problem, that book is incredible, I learned a ton from it. There are still parts of it, especially the async stuff, that I am struggling to wrap my head around.

Ok so _zero_pad_gaps actually acts on the resulting correlogram correct? Ya, that makes sense to me.

@calum-chamberlain
Copy link
Member Author

Ah, no, _zero_pad_gaps just works on the trace data... this would just encompass a pre-processing (filter and resample, not correlate) process... Does that make sense? The flow is something like:

  1. Read in data that has some gaps into a Stream with multiple segments;
  2. Call _fill_gaps to make the data continuous;
  3. Filter, resample and anything else;
  4. Call _zero_pad_gaps to cut out data from the gap positions determined by _fill_gaps, and replace with zeros.
  5. Call match-filter, the correlation function returns zeros when there are fewer than two non-zero samples in the correlation window.

I was imagining having step 1 as __enter__ and step 4 as __exit__. It's not easy to edit the correlogram because the stacked correlogram is returned for memory efficiency.

@d-chambers
Copy link
Collaborator

So the context manager is specifically to enforce _fill_gaps being called first and _zero_pad_gaps being called last in the preprocessing correct? I can see why that would be useful and it does seem like a good fit for a context manager to me.

@calum-chamberlain
Copy link
Member Author

Yup, that's it - I'm hoping that it would be simple for people to use as well - its the only bit of the processing functions that I would say is really required for the correlation functions. Everything else could/should be personal preference.

@calum-chamberlain calum-chamberlain added this to the 0.3.3 milestone Oct 26, 2018
@calum-chamberlain
Copy link
Member Author

Playing around more with this, I don't think the context manager fits and I'm just going to expose the (previously "private") gap handling functions.

@calum-chamberlain calum-chamberlain added this to To do in 0.4.0 via automation Nov 18, 2018
@calum-chamberlain calum-chamberlain modified the milestones: 0.3.3, 0.4.0 Nov 18, 2018
@calum-chamberlain
Copy link
Member Author

Working on adding custom processing functions here

@calum-chamberlain calum-chamberlain changed the title gap-handling context manager and custom process-function options custom process-function options Jan 22, 2019
@calum-chamberlain calum-chamberlain removed this from To do in 0.4.0 May 16, 2019
@calum-chamberlain calum-chamberlain modified the milestones: 0.4.0, future May 16, 2019
@calum-chamberlain calum-chamberlain modified the milestones: future, 0.5.0 Aug 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants