Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validity checker #25

Open
3 tasks
mvdh7 opened this issue Apr 20, 2020 · 11 comments
Open
3 tasks

Add validity checker #25

mvdh7 opened this issue Apr 20, 2020 · 11 comments
Labels
enhancement New feature or request

Comments

@mvdh7
Copy link
Owner

mvdh7 commented Apr 20, 2020

Return logical arrays indicating whether calculated results fall in valid input ranges (e.g. of temperature, salinity and pressure) given the sets of constants used.

One way to do this:

  • Add validity output to equilibrium constant functions
  • Accumulate validity outputs through the different calculations
  • Report validity results in main CO2SYS output

Another way would be to build a separate function that just checks validity, rather than adding extra outputs to the equilibrium constant functions and so on. This is less robust but less breaking?

@mvdh7 mvdh7 added the enhancement New feature or request label Apr 20, 2020
@lukegre
Copy link
Contributor

lukegre commented Apr 28, 2020

I might be able to help out here.

I've done something similar for SeaFlux, but it only raises error's at the moment. Could rather raise a warning at the end with a quality flag column?

I also noticed that the original version didn't pick up when there are negative pCO2 values (might seem ridiculous, but it was machine learning output gone wrong). Would be easy to add these simple checks.

@mvdh7
Copy link
Owner Author

mvdh7 commented Apr 28, 2020

Thanks! I was imagining something that would not prevent the calculations from taking place, but would just return a logical array or flag like you suggest for each equilibrium constant indicating whether the conditions used fall within its valid range, with a warning at the end.

I'd prefer for this to be built as a separate function to the main CO2SYS program, but with a similar set of inputs/a switch to turn it on from the main program. Instinctively I also don't want to add a second output to all the equilibrium constant functions, but rather to make a second set of functions that just do validity checks. But I'm not certain and I'm open to an alternative argument.

I just eliminated some negative pCO2s today that arise from having the pH - alkalinity input pair with pH so high that the non-carbonate alkalinity that you calculate from it is greater than input alkalinity, which leads to negative bicarb + 2*carb and hence negative DIC and pCO2. These input values are physically impossible so I have NaN'd the subsequent results out if they are given. If you find other cases that give negative pCO2s then please do let me know. I'd rather figure out why each one happens individually and cut it off in the appropriate place rather than just blindly remove them.

Aside - did you see the commits I tagged you in at the end of last week?

@lukegre
Copy link
Contributor

lukegre commented Apr 30, 2020

Aside - did you see the commits I tagged you in at the end of last week?

Nope, not sure I've got the notifications turned on. Went looking, couldn't find them. In what repo?

@lukegre
Copy link
Contributor

lukegre commented Apr 30, 2020

I'd prefer for this to be built as a separate function to the main CO2SYS program, but with a similar set of inputs/a switch to turn it on from the main program. Instinctively I also don't want to add a second output to all the equilibrium constant functions, but rather to make a second set of functions that just do validity checks. But I'm not certain and I'm open to an alternative argument.

In a separate module (still a part of PyCO2SYS) makes most sense I would say. Wouldn't be difficult to achieve. Might be able to add a flag and comment column, where comment gives information about the processing.

@mvdh7
Copy link
Owner Author

mvdh7 commented Apr 30, 2020

Aside - did you see the commits I tagged you in at the end of last week?

Nope, not sure I've got the notifications turned on. Went looking, couldn't find them. In what repo?

532fb0e
b6be9ba

@mvdh7
Copy link
Owner Author

mvdh7 commented Apr 30, 2020

I'd prefer for this to be built as a separate function to the main CO2SYS program, but with a similar set of inputs/a switch to turn it on from the main program. Instinctively I also don't want to add a second output to all the equilibrium constant functions, but rather to make a second set of functions that just do validity checks. But I'm not certain and I'm open to an alternative argument.

In a separate module (still a part of PyCO2SYS) makes most sense I would say. Wouldn't be difficult to achieve. Might be able to add a flag and comment column, where comment gives information about the processing.

Sounds sensible - very happy for you to do this!

@lukegre
Copy link
Contributor

lukegre commented May 17, 2020

Thanks! I was imagining something that would not prevent the calculations from taking place, but would just return a logical array or flag like you suggest for each equilibrium constant indicating whether the conditions used fall within its valid range, with a warning at the end.

What about using numpy.ma.masked_array for the final output. A lot neater than using bool masks. Users would still be able to get the output with arr.data, but the default would be masked.

@mvdh7
Copy link
Owner Author

mvdh7 commented May 18, 2020

What about using numpy.ma.masked_array. A lot neater than using bool masks.

Nice idea, not very familiar with it myself, but that's no reason not to use it. Could it track which constants were (in)valid, or does it just return a single 1D mask for each output variable?

@lukegre
Copy link
Contributor

lukegre commented Jul 11, 2020

Hey, I've had a bit of time recently to think about this again. Using decorators, or a similar approach might be quite nice. Not sure if you've used these, but there would be two options with a decorator:

My suggestion

func = kH2CO3_SWS_HM_DM87  # dissociation constant funciton
# create a unit_checker function that will check the data but not run the function
# names of kwargs have to match the function inputs (much more informative for metadata output)
output = unit_checker(func, TempK=(-2, 50), Sal=[5, 50])(temperature_data, salinity_data)
# returns bool mask array and a string array that tells the user which input is outside the limit
mask_for_TempK_and_Sal, string_info_about_temp_or_sal_limits = output
# mask is True where outside limits - can easily be changed
>>> output
(array([False, True, True, True], dtype=object),
 array([
    '', 
    'kH2CO3_SWS_HM_DM87: Sal > 2; ',
    'kH2CO3_SWS_HM_DM87: Sal < -2; ', 
    'kH2CO3_SWS_HM_DM87: Sal > 2; '
], dtype=object))

This could be taken further, that you don't have to supply TempK and Sal explicitly, as this can be stored in a dictionary associated with the name of the function.

I've written most of the code to do this, so let me know what you think about this way of doing it.
I will implement this in SeaFlux in the next while so you can what I mean.

@lukegre
Copy link
Contributor

lukegre commented Jul 12, 2020

So, after some tinkering I've implemented the method for SeaFlux. It's a bit buggy at the moment, but I think it's a really useful tool for most scientific software.

Here's a link to a notebook showing how I've implemented it. I think a bit more testing will have to be done before implementing this in PyCO2SYS.
https://github.com/luke-gregor/SeaFlux/blob/develop/notebooks/limit_check_testing.ipynb

A much simpler version (as shown in my previous comment) would be really easy to implement.

@mvdh7
Copy link
Owner Author

mvdh7 commented Jul 16, 2020

This looks like a really neat way to do it, great idea. Will try to have a more detailed look when I get a chance, let me know if there's anything specific I can do to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants