Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate combining datasets for dssterrapixel.plate and DSSPngL5to12_* #88

Open
twsouthwick opened this issue Sep 21, 2020 · 5 comments

Comments

@twsouthwick
Copy link
Collaborator

Worth keeping in mind here that the DSS dataset is splintered into these sub-plate-files only because of the previous 4 GB file size limit. On the blob storage, the handling here could be simplified to pull data from one large pyramid, rather than splitting between the one file for levels 0-8 (dssterrapixel.plate) and the collection of plates for levels 5-12 (DSSpngL5to12_x{1}_y{2}.plate).

Relevant here is that I'm pretty sure that our wwtfiles blob container already contains the loose PNG files as a single 0-12 tile pyramid in the dsstp container.

Originally posted by @pkgw in #87 (comment)

@pkgw
Copy link
Contributor

pkgw commented Sep 21, 2020

Quick update from reviewing my notes from inventorying the storage containers:

  • wwtfiles/dsstp is a subset: TOAST levels 8-12, ~21 million blobs, 727 GiB
  • wwtfiles/dss appears to be the whole shebang: TOAST levels 0-12, 22.3 million blobs, 800 GiB

(Note that level N+1 is 4× as big as level N, so even though levels 0 to 7 are more than half of the levels, their total size is much smaller than level 12.)

The file name format in dss is DSSTerraPixelL{L}X{X}Y{Y}.png

@pkgw
Copy link
Contributor

pkgw commented Nov 5, 2020

Now that we can work with plate files in blob storage, I think we can close this issue for the time being.

@pkgw pkgw closed this as completed Nov 5, 2020
@twsouthwick
Copy link
Collaborator Author

@pkgw Is the 4gb file size limit due to the fileformat itself? It would be great to combine them all into a single platefile

@pkgw
Copy link
Contributor

pkgw commented Nov 5, 2020

I'm not sure. I thought it was due to filesystem issues, but from some quick Wikipedia-ing it looks like NTFS has supported files larger than 4 GiB for a long time? And I assume that would be the relevant filesystem.

@twsouthwick
Copy link
Collaborator Author

Hmm that could be interesting. Azure blobs support huge blobs as well (terabytes) so it could simplify plate files. That would really make it simple to have a data-driven endpoint.

I'm going to reopen this to track if it's possible - probably longer term, so feel free to close again.

@twsouthwick twsouthwick reopened this Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants