Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_datatree performance improvement on NetCDF, H5, and Zarr files #9014

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

aladinor
Copy link

@aladinor aladinor commented May 7, 2024

open_datatree performance improvement on NetCDF files

Copy link

welcome bot commented May 7, 2024

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label May 8, 2024
@TomNicholas TomNicholas added this to In progress in DataTree integration via automation May 8, 2024
@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label May 10, 2024
@aladinor aladinor changed the title open_datatree performance improvement on NetCDF files open_datatree performance improvement on NetCDF and Zarr files May 10, 2024
Copy link
Contributor

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thoughts about the legacyhdf5 api and how it might be incorporated.

xarray/backends/netCDF4_.py Show resolved Hide resolved
aladinor and others added 3 commits May 28, 2024 17:07
@aladinor aladinor requested a review from flamingbear May 29, 2024 01:34
aladinor

This comment was marked as outdated.

@aladinor aladinor changed the title open_datatree performance improvement on NetCDF and Zarr files open_datatree performance improvement on NetCDF, H5, and Zarr files May 29, 2024
xarray/backends/zarr.py Outdated Show resolved Hide resolved
xarray/backends/zarr.py Outdated Show resolved Hide resolved
…g group variable typing hints (str | Iterable[str] | callable) under the open_datatree for h5 files. Finally, separating positional from keyword args
…ding group variable typing hints (str | Iterable[str] | callable) under the open_datatree method for netCDF files
…ding group variable typing hints (str | Iterable[str] | callable) under the open_datatree method for zarr files
@dcherian dcherian requested a review from kmuehlbauer June 4, 2024 16:11
Copy link
Contributor

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks for working through the dual library stuff with me.

@TomNicholas
Copy link
Contributor

Yes very excited by this! Two final things:

  • This deserves a whats-new.rst entry!
  • Would you be willing to add an benchmark test? You can see here how we benchmark opening and loading a single netCDF file

def time_load_dataset_netcdf4(self):

Alternatively we could leave adding that benchmark to a separate PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io run-benchmark Run the ASV benchmark workflow topic-backends topic-DataTree Related to the implementation of a DataTree class topic-performance
Projects
Development

Successfully merging this pull request may close these issues.

Improving performance of open_datatree
7 participants