Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.5.3]: Issues with the reading of Zarr datasets #1100

Open
1 task done
drentawc opened this issue Oct 17, 2022 · 7 comments
Open
1 task done

[5.5.3]: Issues with the reading of Zarr datasets #1100

drentawc opened this issue Oct 17, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@drentawc
Copy link

Versions impacted by the bug

v5.x

What went wrong?

I am currently trying to do some Zarr data access testing for a piece of Java geographic mapping software. I noted that the NetCDF-Java library specifically only supports the base Zarr v2 spec and was thus trying to convert NetCDF4 files to the Zarr dataset using the hdf5 and Zarr python package. That caused errors, I asked the Zarr developers what they recommended and specifically said to use xarray's to_zarr() method for NetCDF files. I have used this in the past and have been able to convert many NetCDF accordingly for use in Python specific tests but NetCDF-Java continuously throws errors when trying to open these datasets when using NetcdfFiles.open(). Is there another way to generate Zarr files from HDF/NetCDF that will work with NetCDF-Java? Or will I need to wait for future versions that support Zarr v3, NCZarr, or Xarray?

Relevant stack trace

Java access code:

        try (NetcdfFile ncfile = NetcdfFiles.open(zarrPath);) {

            System.out.println(ncfile.getVariables());

        } catch (IOException ioe) {

            System.out.println(ioe);

        }

Relevant log messages

Exception in thread "main" java.io.IOException: java.lang.IllegalArgumentException: Cannot determine attribute's type
at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:279)
at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:243)
at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:216)
at nczarrtest.Zarr.netCdfRead(Zarr.java:85)
at nczarrtest.Zarr.main(Zarr.java:52)
Caused by: java.lang.IllegalArgumentException: Cannot determine attribute's type
at ucar.nc2.Attribute$Builder.setValues(Attribute.java:821)
at ucar.nc2.iosp.zarr.ZarrHeader.lambda$makeAttributes$0(ZarrHeader.java:241)
at java.util.HashMap$KeySet.forEach(HashMap.java:934)
at ucar.nc2.iosp.zarr.ZarrHeader.makeAttributes(ZarrHeader.java:237)
at ucar.nc2.iosp.zarr.ZarrHeader.read(ZarrHeader.java:128)
at ucar.nc2.iosp.zarr.ZarrIosp.build(ZarrIosp.java:59)
at ucar.nc2.NetcdfFiles.build(NetcdfFiles.java:811)
at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:750)
at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:276)
... 4 more

If you have an example file that you can share, please attach it to this issue.

If so, may we include it in our test datasets to help ensure the bug does not return once fixed?
Note: the test datasets are publicly accessible without restriction.

N/A

Code of Conduct

  • I agree to follow the UCAR/Unidata Code of Conduct
@drentawc drentawc added the bug Something isn't working label Oct 17, 2022
@haileyajohnson
Copy link
Member

What kind of Zarr dataset are you trying to open? We do currently only support pure Zarr v2.

@drentawc
Copy link
Author

I created a Zarr dataset using xarray by doing the following :

data = xa.open_dataset(netcdfFilePath)

data.to_zarr('outputs/netcdf.zarr')

I am just trying to convert a NetCDF file into Zarr and then to eventually access it from an S3 bucket using NetCDF-Java. I would think this method of creating a datset would adhere to the pure Zarr v2 spec since the Zarr developers recommend using xarray to convert NetCDF files to Zarr. If not then I am not sure how to correctly create Zarr to be accessed with NetCDF-Java.

@haileyajohnson
Copy link
Member

Could you provide a sample file for us to debug?

@drentawc
Copy link
Author

Yes here are a couple netcdf files I used as well as their zarr counterpart that were created using xarray.
chlor_a_zarr.tar.gz
smdata_zarr.tar.gz

@rschmunk
Copy link
Contributor

Taking a look at the smdata store, the exception occurs because the coord_ref variable has an _ARRAY_DIMENSIONS attribute which is an empty array.

Simply removing that attribute doesn't solve anything, as I then get encounter an invalid regex exception when trying to open the data store.

@drentawc
Copy link
Author

Ahh I should have realized that there may be an issue with the _ARRAY_DIMENSIONS attribute but since removing that doesn't resolve the full issue, is there a surefire way to convert NetCDF or HDF files to a Zarr store since the zarr team recommends xarray?

@drentawc
Copy link
Author

Or could it be an issue with the NetCDF files formatting/attributes/data that are not being properly converted to Zarr when using the Xarray method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants