Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5 files are confusing #1468

Open
eschnett opened this issue Jun 27, 2023 · 2 comments
Open

HDF5 files are confusing #1468

eschnett opened this issue Jun 27, 2023 · 2 comments

Comments

@eschnett
Copy link
Contributor

We looked at an HDF5 file that was generated by openPMD and were very confused that there were so many zeros in the file (see EinsteinToolkit/CarpetX#152), suspecting a bug in our code. The "solution" was that openPMD extends all datasets to cover the whole domain and fills in the undefined regions with zeros.

I don't think that using the value 0 to indicate an undefined value is convenient. This makes the resulting HDF5 files confusing at best. I recommend using H5Pset_fill_value() to use a different value (nan?) to fill undefined points.

@eschnett
Copy link
Contributor Author

Note that this was a dataset with mesh refinement data so that refined levels contain a lot of zeros. However, even on the coarse grid (that covers the whole domain) we did not write boundary data, and those undefined points were turned into zeros.

@franzpoeschel
Copy link
Contributor

franzpoeschel commented Jun 28, 2023

Reference to the mesh refinement PR of the openPMD standard: https://github.com/openPMD/openPMD-standard/pull/252/files

If the implemented file format supports sparse data sets, i.e. through efficient chunking of patches, the refined level must over the previous level in extend and store multiple patches through its chunking mechanism.

File formats that do not support efficient storage a sparesly populated refinement level can store continguous patches on the same level with an additional suffix _<P> where <P> is the number of the (hyperrectangular) patch in the refinement level.

The implementation of mesh refinement is more difficult in file backends whose data representation have no good native support for sparse datasets. Unfortunately, HDF5 is one of them.

openPMD itself does not even specify a fill value (in fact, we specify H5Pset_fill_time(datasetCreationProperty, H5D_FILL_TIME_NEVER)), this is something that HDF5 does on its own.

It should not be too difficult to add an option that would allow users to specify a custom fill value. However, I doubt that it would be really useful for the use case as you would still get n refinement levels that use disk space corresponding to the full domain. (EDIT: I think that chunking can avoid this?) Also, HDF5 has no straightforward way to recover the regions that were actually written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants