Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert fetching grids to use compressed zarr files #120

Open
2 of 21 tasks
mdtanker opened this issue Nov 3, 2022 · 1 comment
Open
2 of 21 tasks

Convert fetching grids to use compressed zarr files #120

mdtanker opened this issue Nov 3, 2022 · 1 comment

Comments

@mdtanker
Copy link
Owner

mdtanker commented Nov 3, 2022

Description of the desired feature:

Due to some issues with Bedmap2 geotiffs (#119 ) as well as the pooch cache starting to grow very large, it would be good to convert any fetch calls on gridded data (geotiffs, netcdfs, etc) to preprocess the files into Zarr files. Initial testing with the bedmap2 tiffs showed a good amount of compression by using .zarr's, 87.8 MB for a .tif and 39.2 MB for a .zarr.

Unfortunately, Pooch keeps the unzipped file, as well as the non-preprocessed .tif files. So preprocessing all files to .zarr will only save space if we get pooch to delete the original files.

There is some discussion of this here, here

There doesn't seem to be support for this yet with Pooch. Maybe we can just use os.remove(fname) at some point in the fetch call?

For now I will start convertign to zarr anyway.

Geotiffs

  • bedmap2
  • REMA
  • deepbedmap
  • burton-johnson GHF
  • stal GIA
  • MODIS MOA
  • LIMA

NetCDFs

  • bedmachine
  • RIS basement
  • baranov sediment thickness
  • tankersley sediment thickness
  • globsed
  • IBCSO
  • antgg
  • eigen gravity
  • earth topo
  • eigen geoid
  • aq1 GHF
  • shen crust
  • an crust
  • shen moho

Are you willing to help implement and maintain this feature?

This was referenced Nov 6, 2022
@mdtanker
Copy link
Owner Author

mdtanker commented Nov 8, 2022

Discovered that you can't use os.remove(fname) since pooch uses the originally download file names (unzipped) in each call, even if preprocessing is already applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant