Auto Discovery of Calculable Values #30

DocOtak · 2022-02-25T22:49:32Z

I was thinking of a feature that would be nice to have (in a short / long term future): being able to give a dataset ds to the gsw wrapped functions, and gsw-xarray would then get from ds the necessary dataarrays, based on their standard_name. So one could use: gsw.sigma0(ds). This can lead to many problems (e.g. how to deal with datasets containing more than 1 array for 1 standard_name), but we can keep the discussion on how to solve these problems for later, if we decide to implement this feature.

Originally posted by @rcaneill in #1 (comment)

The text was updated successfully, but these errors were encountered:

DocOtak · 2022-02-25T22:57:53Z

Totally agree this would be awesome, and should be attempted (iirc it was one of our original goals).

Initial thoughts:

It might be good to rely on the cf-xarray package for attribute discovery
Many of the of the inputs to gsw functions are the outputs of other functions. For example, in observational datasets, you'll never have CT or SA. So when I ask for rho, I probably want this to automatically calculate the needed CT and SA values from the in situ temperature, practical salinity, pressure, and lat/lon.

rcaneill · 2022-02-26T09:23:41Z

I think that they are here 2 slightly different features:

A simple one, where gsw.sigma0(ds) would take SA and CT from ds (but they need to exists)
A more complex one, where gsw_xarray tries to find a path from the existing variable to what is needed (here SA and CT). This seems to look like what GNU make is trying to resolve. Without needing to use a so complex tool, we can get ideas from the way it works

rcaneill · 2022-06-22T12:06:02Z

When trying to implement this feature, we will have a problem: many inputs of the gsw functions don't have cf standard names (e.g. saturation_fraction, entropy, enthalpy, etc). I see 2 ways to handle this:

Open a request to add them to cf convention (I have never done this so I don't know how it works)
Use a custom criteria with cf-xarray https://cf-xarray.readthedocs.io/en/latest/custom-criteria.html
I think that these 2 options can be used together, as I guess that adding things to cf standard names can take some time

rcaneill · 2022-07-07T13:06:04Z

I am on my way to implement this

rcaneill · 2022-07-07T13:47:29Z

About the API, the way I see it would be:
gsw_xarray.sigma0(ds) with ds a dataset containing (hopefully) all dataarrays necessary to compute sigma0, even with some extra steps. Does is make sense to do it this way?
It will raise a ValueError if the dataset does not contain enough information.

I guess we can also add an extra argument: gsw_xarray.sigma0(ds, inplace=True) (or False) to return a new dataset or to add the dataarrays into ds.

Question: if one user has h (enthalpy), z, lat, and SA in ds and wants to compute sigma0 (it is possible), should we save the intermediate variables necessary (e.g. CT) in ds, or should we only return the final result? I think that both options make sense: 1st option is good if the user asks again for sigma1, sigma2, sigma3 because then CT will be already computed. This could be controlled by another argument (e.g. intermediate_variables=True).

rcaneill · 2022-07-07T14:10:38Z

Here is my WIP notebook, with the algo I developed. It is not perfect, but is works well! (it is for now based on names in ds, not on standard names)

https://gist.github.com/rcaneill/0aa8b9e72112d079c4919e462a4bb378

rcaneill · 2022-07-07T14:12:21Z

I tried the other option, i.e. starting from the variable we want, and going backward through the graph, but in the end it was not working (easy to get trapped in cycles), and has around 10 times more lines of code. So I thing that it is better to do it the way I do in the notebook.

dcherian · 2022-07-07T15:40:37Z

gsw_xarray.sigma0(ds, inplace=True) could easily just be ds.merge(gsw_xarray.sigma0(ds)), so you don't need to support the inplace keyword

DocOtak · 2022-07-14T21:07:33Z

That looks neat. I'll need to play with it a bit to understand the graph, but looks like a good starting place.

Some thoughts in no particular order:

ODV makes you specify your "key variables" when you are working with a data collection, basically you tell it which specific variable is the one it should use as, e.g. the practical salinity. My datafiles tend to have two channels of salinity, and often bottle salinity in addition to the CTD ones. We need some way of selecting the variables to use for calculations in the event of ambiguity/duplicates.
I wonder how well an implicit graph based on xarray accessors might work... if we made a gsw dataset accessor which let you lookup GSW properties as keys...

Mock example attempt:

from gsw_xarray import accessors # or whatever it needed to register the xarray accessor under the gsw namespace

ds = xr.load_dataset("some_dataset.nc")
# now we use the accessor to get GSW properties, things that need intermediate calculations should just call the accessor itself
SA = ds.gsw["SA"]  # uses PSAL or whatever it needs

rho = ds.gsw["rho"]  # internally calls ds.gsw["SA"], and ds.gsw["CT"]

rcaneill · 2022-07-29T21:43:31Z

ODV makes you specify your "key variables" when you are working with a data collection, basically you tell it which specific variable is the one it should use as, e.g. the practical salinity. My datafiles tend to have two channels of salinity, and often bottle salinity in addition to the CTD ones. We need some way of selecting the variables to use for calculations in the event of ambiguity/duplicates.

Do you have any precise idea for this?

I wonder how well an implicit graph based on xarray accessors might work... if we made a gsw dataset accessor which let you lookup GSW properties as keys...

My guess is that step would be quite strait forward as soon as the function behind is written (I never wrote a xarray accessor before so indeed it will not be so easy)

rcaneill · 2022-07-29T21:49:13Z

While working on this I realized that I need to write the option to work with dataset before (e.g. gsw.sigma0(ds) and we take ds.SA and ds.CT based on standard name).
Because of TEOS-10/GSW-Python#97, this becomes not so easy for me to know if I should store the detected variables onto args or kwargs. Any thoughts on this?

You can answer this in PR #53

rcaneill added this to the Version 0.3.0 milestone Feb 26, 2022

rcaneill modified the milestones: Version 0.3.0, Version 0.4.0 Mar 23, 2022

rcaneill self-assigned this Jul 7, 2022

rcaneill mentioned this issue Jul 30, 2022

Autodect parameters from dataset -- add dataset accessor #53

Merged

rcaneill modified the milestones: Version 0.4.0, Version 1.0.0 Nov 29, 2022

rcaneill mentioned this issue Nov 29, 2022

What is left for version 1.0 ? #56

Open

7 tasks

rcaneill mentioned this issue Jan 10, 2023

Inputs arguments that have no cf standard names #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto Discovery of Calculable Values #30

Auto Discovery of Calculable Values #30

DocOtak commented Feb 25, 2022

DocOtak commented Feb 25, 2022

rcaneill commented Feb 26, 2022

rcaneill commented Jun 22, 2022

rcaneill commented Jul 7, 2022

rcaneill commented Jul 7, 2022

rcaneill commented Jul 7, 2022

rcaneill commented Jul 7, 2022

dcherian commented Jul 7, 2022

DocOtak commented Jul 14, 2022

rcaneill commented Jul 29, 2022

rcaneill commented Jul 29, 2022 •

edited

Auto Discovery of Calculable Values #30

Auto Discovery of Calculable Values #30

Comments

DocOtak commented Feb 25, 2022

DocOtak commented Feb 25, 2022

rcaneill commented Feb 26, 2022

rcaneill commented Jun 22, 2022

rcaneill commented Jul 7, 2022

rcaneill commented Jul 7, 2022

rcaneill commented Jul 7, 2022

rcaneill commented Jul 7, 2022

dcherian commented Jul 7, 2022

DocOtak commented Jul 14, 2022

rcaneill commented Jul 29, 2022

rcaneill commented Jul 29, 2022 • edited

rcaneill commented Jul 29, 2022 •

edited