Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to_netcdf docstring to list support for explicit CDF5 writes #8985

Open
JulioTBacmeister opened this issue Apr 30, 2024 · 4 comments
Open

Comments

@JulioTBacmeister
Copy link

Is your feature request related to a problem?

I cannot get to_netcdf() to write files in CDF5 format as identifed by the 'ncdump -k' command.

Describe the solution you'd like

When I write a netcdf file using:

D.to_netcdf( filename )

then ask ncdump to tell me the kind of file I have,

ncdump -k filename

it returns 'netCDF-4'. Unfortunately, this file won't work in the Community Atmpshere Model (CAM), as an initial condition for example. CAM will bomb when it tries to read it. After converting the file with this command:

nccopy -k cdf5 filename cdf5_filename

the file now works in CAM. Also, the command

ncdump -k cdf5_filename

returns 'cdf5'.

I confess I don't know what the nccopy command is doing, but it seems to be needed for the file to be readable by CAM. I am looking for an option in the to_netcdf method that will explicitly write 'cdf5' files without needing to resort to the nccopy command.

Describe alternatives you've considered

Writing netcdf-4 files from xarray and converting via
nccopy -k cdf5 filename cdf5_filename

Additional context

No response

Copy link

welcome bot commented Apr 30, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@dcherian
Copy link
Contributor

dcherian commented Apr 30, 2024

Looks like we allow customizing the format in to_netcdf (though the docstring is out of date). By default we choose "NETCDF4" which explains the behaviour you see.

Looking at https://unidata.github.io/netcdf4-python/#creatingopeningclosing-a-netcdf-file, what you want is confusingly named "NETCDF3_64BIT_DATA"

import numpy as np
import xarray as xr

ds = xr.Dataset({'a': np.arange(10)})
ds.to_netcdf("foo.nc", format="NETCDF3_64BIT_DATA")

Then ncks -k foo.nc returns cdf5.

I'll keep this issue open to update the docstring.

I don't know what the nccopy command is doing

It's rewriting the data in the cdf5 format.

@dcherian dcherian changed the title support for explicit CDF5 writes update to_netcdf docstring to list support for explicit CDF5 writes Apr 30, 2024
@JulioTBacmeister
Copy link
Author

JulioTBacmeister commented Apr 30, 2024

Thanks Deepak,
I appreciate your looking into this.

I had actually been using

ds.to_netcdf("foo.nc",  format="NETCDF3_64BIT_OFFSET",  engine='scipy' )

in my script earlier. The engine='scipy' is there because without it the write hangs. I put it there based on a suggestion from somebody at CISL. This was a good solution for creating initial condition files in the 11GB range.

Upon doubling resolution the IC files go to ~44GB and this no longer works. Hence my questions about 'cdf5'.

I have just tried writing my 44GB IC file in 2 ways:

  1. ds.to_netcdf("foo.nc", format="NETCDF3_64BIT_DATA", )
  2. ds.to_netcdf("foo.nc", format="NETCDF3_64BIT_DATA", engine='scipy' )

Unfortunately, (1) hangs during write, and (2) returns this error:

ValueError: invalid format for scipy.io.netcdf backend: 'NETCDF3_64BIT_DATA'

I might blame memory issues for the failure of (1) except that a straight default write to netCDF-4:

  1. ds.to_netcdf("foo.nc")

works just fine.

@dcherian
Copy link
Contributor

dcherian commented Apr 30, 2024

Yes this the scipy engine only supports classic netCDF3 files.

Are you using dask to write? I assume so given the size.

We've had reports of a deadlock with newer netCDF versions (#7079, #3961) that no one has resolved yet so perhaps downgrading to 1.6.0 will help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants