Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added a simple MLP neural network for wet-dry classification #146

Merged
merged 7 commits into from Jan 3, 2024

Conversation

eoydvin
Copy link
Contributor

@eoydvin eoydvin commented Dec 5, 2023

This uploads the MLP method in issue #145

The network was trained on a few CMLs in Norway with reference being rainfall recorded by nearby disdrometers. I can be run similar to the existing CNN (Polz et al. 2020) by for instance using:

cml["wet_oydvin"] = xr.full_like(cml.tl, np.nan)
cml["wet_p_oydvin"] = xr.full_like(cml.tl, np.nan)

for cmlid in tqdm(cml.cml_id):
    cml_tmp = cml.sel(cml_id=cmlid)
    #cml_tmp=cml_tmp.resample(time="1min").first().to_dataset()
    mlp_out = mlp_wet_dry(
        cml_tmp.isel(sublink_id = 0).tl.values,
        cml_tmp.isel(sublink_id = 1).tl.values,
    )
    
    cml_tmp['wet'] = xr.full_like(cml_tmp.tl, np.nan)
    cml_tmp['wet_p'] = xr.full_like(cml_tmp.tl, np.nan)
    
    cml_tmp['wet_p'].loc[{'sublink_id': 'sublink_1'}] = mlp_out[:, 1]
    cml_tmp['wet_p'].loc[{'sublink_id': 'sublink_2'}] = mlp_out[:, 1] # probab for wet
    cml_tmp['wet'].loc[{'sublink_id': 'sublink_1'}] = np.argmax(mlp_out, axis = 1)
    cml_tmp['wet'].loc[{'sublink_id': 'sublink_2'}] = np.argmax(mlp_out, axis = 1)
    
    cml["wet_p_oydvin"].loc[dict(cml_id=cmlid)]=cml_tmp.wet_p
    cml["wet_oydvin"].loc[dict(cml_id=cmlid)]=cml_tmp.wet

@cchwala
Copy link
Contributor

cchwala commented Dec 5, 2023

Thanks @eoydvin 👍

I will add some (probably minor) comments.

One question: Do you have a notebook where you show the application? We do not yet have a notebook that compares the different wet-dry classification methods. But maybe you could start by adding a very simple and minimal notebook for your method. Then we can later add the other methods.

Copy link
Contributor

@cchwala cchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I only have some minor comments.

threshold=None, # 0.5 is often good, or argmax
):
"""
Wet dry classification using a simple neural network based on channel 1 and channel 2 of a CML
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you state here some more details or is there a document that you can reference?

E.g. what are the details of the network (MLP, but how many neurons, layers)? What it the sample length, i.e. what is the minimum length of the time series that has to be supplied? Explain if and how the model is applied in a sliding window. How is the NaN handling?

I know that the CNN wet-dry also has very little info in the doc string, but it has the paper with many details. (not saying that we need a paper or somehting similar here...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will provide this somehow, Max gave me this idea of publishing it as a technical note somewhere..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can just be 3-4 lines of text in the doc string. That will be sufficient. But right now the user as absolutely no idea what the function uses. Of course, feel free to write a "technical note" paper any time ;-)

trsl_channel_2 : iterable of float
Time series of received signal level of channel 2
threshold : float
Threshold (0 - 1) for setting event as wet or dry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have the option of setting this to None and return the continuous output instead of the binary one derived with the threshold. The threshold can easily be applied later and this might make it easier to create a ROC curve where you want to sweep over the thresholds.

update: just saw that None is default, but your docstring is not correctly telling us that ;-)


np.testing.assert_almost_equal(pred[280:293], truth)
np.testing.assert_almost_equal(
np.round(pred_raw, decimals=7)[280:293], truth_raw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_almost_equal has a decimal kwarg to allow matching on coarser resolution. I guess this is what you do here with the np.round. If possible, please adjust.

mlp_pred = np.zeros([x_fts.shape[0], 2])*np.nan
indices = np.argwhere(~np.isnan(x_fts).any(axis = 1)).ravel()

if indices.size > 0: # everything is nan, mlp_pred is then all nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this if-statement is not true if we have an all-NaN in the sample and thus also do not do any prediction. I find the comment missleading, since, if I understand correctly, it explains what happens in the case the if-statement is not true. Can you adjust to make this clearer.

@cchwala cchwala changed the title added a simple neural network added a simple MLP neural network for wet-dry classification Dec 5, 2023
@maxmargraf
Copy link
Contributor

Thanks for the PR!
What of the case when only on sublink is available? Is it advisable to duplicate one sublink to get two or should this not be done anyways?

@eoydvin
Copy link
Contributor Author

eoydvin commented Dec 5, 2023

Thanks for the PR! What of the case when only on sublink is available? Is it advisable to duplicate one sublink to get two or should this not be done anyways?

Yes, duplicate the sublink so that tl from channel_1 is in channel_1 and channel_2

@eoydvin
Copy link
Contributor Author

eoydvin commented Dec 5, 2023

Thanks @eoydvin 👍

I will add some (probably minor) comments.

One question: Do you have a notebook where you show the application? We do not yet have a notebook that compares the different wet-dry classification methods. But maybe you could start by adding a very simple and minimal notebook for your method. Then we can later add the other methods.

I do actually, it is just a modification of "Basic CML processing workflow.ipynb", it compares the different wet/dry detection methods and includes some thoughts. It will need some review I think.. See uploaded notebook "Wet dry example"

Copy link

codecov bot commented Dec 5, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (28d54a4) 74.83% compared to head (b7c99d4) 75.44%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
+ Coverage   74.83%   75.44%   +0.60%     
==========================================
  Files          29       30       +1     
  Lines        1089     1116      +27     
==========================================
+ Hits          815      842      +27     
  Misses        274      274              
Flag Coverage Δ
unittests 75.44% <100.00%> (+0.60%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cchwala
Copy link
Contributor

cchwala commented Dec 5, 2023

The notebook looks good. Some comments:

  • The notebook is a bit busy. Maybe you can clean up the initial part that is similar to the example processing notebook, i.e. just state that it is similar and then remove all (or most of) the output.
  • The wet-dry comparison part is good, but it would be nice to see a second or even third CML, maybe also one that is more challenging due to noisiness.
  • I am not sure why you do the full processing of all CMLs at the bottom. You do not do an analysis of the results. Maybe you can remove this part at the bottom?
  • I would keep it simple for now. Sooner or later there should be a comparison and validation with the reference data. But that will take too long now and is out of scope of this PR because it will require some wet-dry performance metrics and plots. I would keep that for later because @maxmargraf is doing similar things for the method intercomparison and we might be able to merge some code from him into pycomlink (or into the upcoming ragali) for wet-dry validation.

mlp.py:
 - added docstring

Wet dry example.ipynb:
 - Do pre-processing in one cell, refer to  "Basic CML processing workflow.ipynb" for more details.
 - Investigate two interesting CMLs
 - Shorten the notebook to only compare baselines.
Update: The MLP was retrained using more CMLs and a larger validation dataset.

Wet dry example.ipynb:
- re run notebook with retrained weights

mlp.py:
- updated docstring to match retrained architecture

model_mlp.keras:
- updated weights and architecture

test_wet_dry_mlp:
- updated to run with new weights
@maxmargraf
Copy link
Contributor

Thanks for the changes. Some minor/cosmetic suggestions for the notebook before this PR is ready to be merged from my side:

  • Could you rename the notebook to be more specific: e.g. "Rain event detection methods"
  • add a legend to the two plots labeling TL and baseline
  • add plt.tight_layout() or remove overlapping text
  • remove blank cells at the bottom of the notebook
  • Notes could be in markdown

Rain event detection methods.ipynb:
 - Renamed example notebook to current name
 - Updated cosmetic suggestions in example notebook
@cchwala
Copy link
Contributor

cchwala commented Jan 3, 2024

@eoydvin thanks for the update

@maxmargraf feel free to merge when you think it is ready (I will be offline for the next days)

@maxmargraf maxmargraf merged commit 94b612c into pycomlink:master Jan 3, 2024
5 checks passed
@maxmargraf
Copy link
Contributor

Thanks for adding this new method @eoydvin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants