Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Split PrepPipeline into separate methods, make final interpolation optional #99

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

a-hurst
Copy link
Collaborator

@a-hurst a-hurst commented Jul 3, 2021

PR Description

(Eventually) closes #73. Currently, this PR:

  • Adds separate remove_line_noise and robust_reference methods to PrepPipeline.
  • Makes final bad channel interpolation optional.
  • Removes a bunch of unused matplotlib-based MATLAB comparison code in the PrepPipeline unit tests.
  • Removes support for notch filter types other than spectrum_fit, since you can now just filter line noise however you want and then run PrepPipeline.robust_reference if you want to use another kind. This vastly simplifies the demands on the filter_kwargs API and should make the PrepSettings filtering options much easier to document.

There's still definitely more cleanup to do, but I figured it'd be good to get the core of this up for review sooner than later!

Merge Checklist

  • the PR has been reviewed and all comments are resolved
  • all CI checks pass
  • (if applicable): the PR description includes the phrase closes #<issue-number> to automatically close an issue
  • (if applicable): bug fixes, new features, or API changes are documented in whats_new.rst

@a-hurst
Copy link
Collaborator Author

a-hurst commented Jul 3, 2021

Whoops, looks like one of the examples relies on an undocumented attribute I removed for the sake of RAM (self.EEG_new). Will address that tomorrow.

@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2021

Codecov Report

Merging #99 (0688a5c) into master (ec44384) will decrease coverage by 0.35%.
The diff coverage is 95.34%.

❗ Current head 0688a5c differs from pull request most recent head 6ff2755. Consider uploading reports for the commit 6ff2755 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master      #99      +/-   ##
==========================================
- Coverage   99.04%   98.68%   -0.36%     
==========================================
  Files           7        7              
  Lines         734      762      +28     
==========================================
+ Hits          727      752      +25     
- Misses          7       10       +3     
Impacted Files Coverage Δ
pyprep/reference.py 97.88% <90.00%> (-1.30%) ⬇️
pyprep/prep_pipeline.py 98.59% <100.00%> (-1.41%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec44384...6ff2755. Read the comment docs.

Copy link
Owner

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it's great that we don't need the matplotlib stuff in the tests anymore. Also reminds me of #15

@sappelhoff
Copy link
Owner

It is possible that I introduced these conflicts 😖 it's because I worked on dbe2062 before pulling master ... the pulled master, rebased, and force pushed. That was a bad idea - I hope nothing serious got destroyed.

@a-hurst
Copy link
Collaborator Author

a-hurst commented Jul 17, 2021

It is possible that I introduced these conflicts 😖 it's because I worked on dbe2062 before pulling master ... the pulled master, rebased, and force pushed. That was a bad idea - I hope nothing serious got destroyed.

No worries, I'll rebase before I push any more work! I've set this aside for a few days while tackling some other projects, but I'll try and get this wrapped up within the next week or so :)

@a-hurst
Copy link
Collaborator Author

a-hurst commented Aug 4, 2021

Okay @sappelhoff, I've gotten the PrepPipeline API rewrite to a point I think I'm happy with (though I'd love to hear your thoughts and @yjmantilla's). Some of the attributes aren't documented yet and much of it is untested, but I figured it'd be best to handle that side of things once I was sure we were all happy with the changes.

My main changes here are:

  • Renamed EEG_before_interpolation to EEG_post_reference and reference_before_interpolation to robust_reference_signal
  • Added a getter method current_reference_signal that returns reference_before_interpolation if interpolation hasn't been done, or the post-interpolation average reference (previously reference_after_interpolation) if it has (and None if neither have been performed, though that could be changed).
  • Moved all the loose noisy_channels_original, noisy_channels_before_interpolation, bad_before_interpolation, still_noisy_channels, etc. attributes into two dicts: noisy_info (which contains the full noisy channel dicts) and bad_channels (which just contains the bad channel names), each with the keys 'original', 'post-reference', and 'post-interpolation'. This makes things easier to document and has the advantage of a more consistent/informative naming scheme.
  • Added current_noisy_info and remaining_bad_channels attributes that get the current noisy info and bad channel names for the pipeline (will retrieve post-interpolation values if interpolation was used, otherwise returns the pre-interpolation values). These allow for people to try enabling/disabling interpolation for their data without having to rewrite their code to access different attributes.
  • Added a new get_raw method, which is a more flexible version of the current raw getter that allows retrieving the full mne.io.Raw object with the EEG data from any given stage in the pipeline.

Let me know what you think!

@sappelhoff sappelhoff marked this pull request as ready for review August 5, 2021 08:19
@sappelhoff
Copy link
Owner

I didn't take a look at the diff because I am not sure what diff will stay after #99 (comment) is resolved, but I still have concerns over these two points (I am commenting only based on your summary comment):

returns reference_before_interpolation if interpolation hasn't been done, or the post-interpolation average reference (previously reference_after_interpolation) if it has

and

(will retrieve post-interpolation values if interpolation was used, otherwise returns the pre-interpolation values

Could we not just return a dict in both cases, with the keys pre_interpolation and post_interpolation ... and the values for these keys being None or empty dicts if these values have not yet been computed? I think that'd be clearer. An alternative would be to send a log message upon the getter call that says something like: "you are getting the post_interpolation values". The minimum would be to have an attribute that can clearly tell you whether the prep object is "pre" or "post" interpolation, but we might already have something like that, not sure right now.

What do you think?

@a-hurst
Copy link
Collaborator Author

a-hurst commented Aug 5, 2021

Could we not just return a dict in both cases, with the keys pre_interpolation and post_interpolation ... and the values for these keys being None or empty dicts if these values have not yet been computed? I think that'd be clearer. An alternative would be to send a log message upon the getter call that says something like: "you are getting the post_interpolation values". The minimum would be to have an attribute that can clearly tell you whether the prep object is "pre" or "post" interpolation, but we might already have something like that, not sure right now.

That could work, though I still think it'd be useful to have an attribute like current_reference_signal so there's a consistent way to get the final reference signal regardless if interpolation was enabled or disabled. For the last project we did with PyPREP we tried the analysis pipeline both with and without final bad channel interpolation, so I think that having an API where you can toggle interpolation on and off without needing to change any attributes you access afterwards would be handy.

Instead of attributes, maybe methods in the style of the new get_raw function would be clearer? For example, a get_noisy_info method that returns the most recent noisy info available by default (like current_noisy_info does now), but also has an argument that lets you specify explicitly whether you want the original, post-reference, or post-interpolation noisy info.

Keep in mind that for the PrepPipeline API I decided against exposing interpolation as a separate method and instead made it an optional flag to robust_reference(), so it should hopefully be clear to users based on the settings they chose (and the documentation) whether the "current reference signal" reflects interpolated data or not.

Copy link
Owner

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of attributes, maybe methods in the style of the new get_raw function would be clearer? For example, a get_noisy_info method that returns the most recent noisy info available by default (like current_noisy_info does now), but also has an argument that lets you specify explicitly whether you want the original, post-reference, or post-interpolation noisy info.

yes! That sounds good to me, I like the new get_raw 👍 But I think the default should be "current", not "None", to be more explicit.

I approve of the remaining changes, but I am slowly losing my grasp on the potential workflows we can have with our different classes and their methods 😇 it's been too long that I actually worked with this code, I think.


Parameters
----------
stage : str, optional
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd do something like this:

Suggested change
stage : str, optional
stage : {"unprocessed", "filtered", "post-reference", "post-interpolation"}, optional

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That causes the line to go beyond 88 characters, is line wrap for argument types something that's supported by Numpy docstyle?

Comment on lines +250 to +251
"Could not retrieve {stage} data, as that stage of the pipeline "
"has not yet been performed."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Could not retrieve {stage} data, as that stage of the pipeline "
"has not yet been performed."
f"Could not retrieve {stage} data, as that stage of the pipeline "
"has not yet been performed."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, nice catch!

@a-hurst
Copy link
Collaborator Author

a-hurst commented Aug 7, 2021

yes! That sounds good to me, I like the new get_raw 👍 But I think the default should be "current", not "None", to be more explicit.

Great! I'll get to work on this, as well as proper tests and docs for everything new. In that case, for the sake of simplicity I'm going to leave the internal noisy info and reference signal attributes undocumented so that the new get_x methods are the one clear official way for users to get at that data.

I approve of the remaining changes, but I am slowly losing my grasp on the potential workflows we can have with our different classes and their methods 😇 it's been too long that I actually worked with this code, I think.

Hopefully cleaning up the example scripts for 0.4 will refresh us all on that front!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend PrepPipeline to allow performing PREP in stages
3 participants