Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to an xarray 'decorator' #25

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Convert to an xarray 'decorator' #25

wants to merge 4 commits into from

Conversation

khaeru
Copy link
Owner

@khaeru khaeru commented Sep 27, 2016

Will close #18

TODO:

  • Check basic xarray functionality has sensible behaviour with lazy=True.
  • Update documentation.

- add new open_dataset() method
- override xr.Dataset.__getitem__
- minimum changes to pass tests
This approach uses the existing code, instead of repeating it with modifications (which would make maintenance harder).
@khaeru khaeru changed the title Convert to an xarray custom decorator (issue #18) Convert to an xarray 'decorator' Sep 27, 2016
@khaeru
Copy link
Owner Author

khaeru commented Sep 27, 2016

Hi @shoyer —if you can spare the time, I'd like to ask for a brief pointer on something here.

In this branch I've converted py-GDX from a xr.Dataset subclass to an accessor, as supported by xarray 0.8. (As you can see from the first commit, the changes required were minimal! Thanks!)

One of the features of the older version was 'lazy' or deferred loading—individual variables would not be read from the underlying file until they were accessed. As you can see, in this branch I had to override (or wrap, basically) xr.Dataset.__getitem__ to preserve that feature.

So this does not really respect the spirit of an accessor. It also didn't support, e.g., the in operator. Before going down the rabbit hole of overriding more of the core xarray functionality, I noticed that some of the other packages linked from the documentation, like xgcm, use a subclass of xr.backends.common.AbstractDataStore; and (though I've never used it) I know that out-of-core computation, i.e. deferred loading, is the whole point of dask support in xarray.

So my question is basically, what is the minimum viable form of a DataStore? Would I also need to subclass or otherwise modify Variable to support a new backend?

Any advice would be much appreciated!

@shoyer
Copy link

shoyer commented Sep 28, 2016

I can't recommend using xarray's DataStore API here, and in fact, I'm pretty sure it isn't even really needed in xgcm. In addition to being a sloppy abstraction (mixing an abstract base class with some highly specific implementation details), it just doesn't do that much beyond loading a dictionary of xarray.Variable objects and setting _file_obj so Dataset.close() works. Seriously, look at the load_store method.

Dask is a nice way to make lazy computed arrays (with eagerly known shape), but it seems like you may need lazily computed keys? If so, I'm afraid nothing in xarray currently solves this problem and a Dataset subclass is actually probably the cleanest solution.

@khaeru
Copy link
Owner Author

khaeru commented Sep 28, 2016

That's helpful, thanks!

I suppose for the time being I will stick with the accessor approach. For the longer run, I will look more closely into dask and consider whether it offers a solution.

@khaeru khaeru self-assigned this Jan 4, 2021
@khaeru khaeru removed their assignment Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert to xarray 'decorator'
2 participants