New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added a simple MLP neural network for wet-dry classification #146
Conversation
Thanks @eoydvin 👍 I will add some (probably minor) comments. One question: Do you have a notebook where you show the application? We do not yet have a notebook that compares the different wet-dry classification methods. But maybe you could start by adding a very simple and minimal notebook for your method. Then we can later add the other methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I only have some minor comments.
pycomlink/processing/wet_dry/mlp.py
Outdated
threshold=None, # 0.5 is often good, or argmax | ||
): | ||
""" | ||
Wet dry classification using a simple neural network based on channel 1 and channel 2 of a CML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you state here some more details or is there a document that you can reference?
E.g. what are the details of the network (MLP, but how many neurons, layers)? What it the sample length, i.e. what is the minimum length of the time series that has to be supplied? Explain if and how the model is applied in a sliding window. How is the NaN handling?
I know that the CNN wet-dry also has very little info in the doc string, but it has the paper with many details. (not saying that we need a paper or somehting similar here...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will provide this somehow, Max gave me this idea of publishing it as a technical note somewhere..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can just be 3-4 lines of text in the doc string. That will be sufficient. But right now the user as absolutely no idea what the function uses. Of course, feel free to write a "technical note" paper any time ;-)
pycomlink/processing/wet_dry/mlp.py
Outdated
trsl_channel_2 : iterable of float | ||
Time series of received signal level of channel 2 | ||
threshold : float | ||
Threshold (0 - 1) for setting event as wet or dry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have the option of setting this to None
and return the continuous output instead of the binary one derived with the threshold. The threshold can easily be applied later and this might make it easier to create a ROC curve where you want to sweep over the thresholds.
update: just saw that None
is default, but your docstring is not correctly telling us that ;-)
pycomlink/tests/test_wet_dry_mlp.py
Outdated
|
||
np.testing.assert_almost_equal(pred[280:293], truth) | ||
np.testing.assert_almost_equal( | ||
np.round(pred_raw, decimals=7)[280:293], truth_raw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_almost_equal
has a decimal
kwarg to allow matching on coarser resolution. I guess this is what you do here with the np.round
. If possible, please adjust.
pycomlink/processing/wet_dry/mlp.py
Outdated
mlp_pred = np.zeros([x_fts.shape[0], 2])*np.nan | ||
indices = np.argwhere(~np.isnan(x_fts).any(axis = 1)).ravel() | ||
|
||
if indices.size > 0: # everything is nan, mlp_pred is then all nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this if-statement is not true if we have an all-NaN in the sample and thus also do not do any prediction. I find the comment missleading, since, if I understand correctly, it explains what happens in the case the if-statement is not true. Can you adjust to make this clearer.
Thanks for the PR! |
Yes, duplicate the sublink so that tl from channel_1 is in channel_1 and channel_2 |
I do actually, it is just a modification of "Basic CML processing workflow.ipynb", it compares the different wet/dry detection methods and includes some thoughts. It will need some review I think.. See uploaded notebook "Wet dry example" |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #146 +/- ##
==========================================
+ Coverage 74.83% 75.44% +0.60%
==========================================
Files 29 30 +1
Lines 1089 1116 +27
==========================================
+ Hits 815 842 +27
Misses 274 274
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The notebook looks good. Some comments:
|
mlp.py: - added docstring Wet dry example.ipynb: - Do pre-processing in one cell, refer to "Basic CML processing workflow.ipynb" for more details. - Investigate two interesting CMLs - Shorten the notebook to only compare baselines.
Update: The MLP was retrained using more CMLs and a larger validation dataset. Wet dry example.ipynb: - re run notebook with retrained weights mlp.py: - updated docstring to match retrained architecture model_mlp.keras: - updated weights and architecture test_wet_dry_mlp: - updated to run with new weights
Thanks for the changes. Some minor/cosmetic suggestions for the notebook before this PR is ready to be merged from my side:
|
Rain event detection methods.ipynb: - Renamed example notebook to current name - Updated cosmetic suggestions in example notebook
@eoydvin thanks for the update @maxmargraf feel free to merge when you think it is ready (I will be offline for the next days) |
Thanks for adding this new method @eoydvin! |
This uploads the MLP method in issue #145
The network was trained on a few CMLs in Norway with reference being rainfall recorded by nearby disdrometers. I can be run similar to the existing CNN (Polz et al. 2020) by for instance using: