Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this a bug, or user error: NotImplementedError: Dataset is not picklable #735

Open
bnlawrence opened this issue Mar 12, 2024 · 4 comments
Labels
bug Something isn't working dask Relating to the use of Dask

Comments

@bnlawrence
Copy link

python
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cf
>>> cf.__version__
'3.16.1'

Attempt to use cf-python to read pp and write some netcdf. Code is:

import cf
import dask

dask.config.set(scheduler='processes',num_workers=12)

def convert(glob):
    ff = cf.read(glob)
    cf.write(ff,'all_year.nc',mode='w')

if __name__=="__main__":
   convert('*.pp')

Platform is jasmin sci6, data is N1280 pp output.

Error log here

@bnlawrence bnlawrence added the question General question label Mar 12, 2024
@davidhassell
Copy link
Collaborator

Hi Bryan,

A bit of digging suggests that this is a bug (pydata/xarray#1464 has the details). However, the writing is locked anyway (a netCDF4-python restriction), so there shouldn't be any benefit in this case from running on 12 workers.

If you remove the dask.config.set(...) line, I suspect that it will work.

I shall make the fix, though, so that your original code works doesn't fail.

@davidhassell davidhassell added bug Something isn't working and removed question General question labels Mar 12, 2024
@davidhassell
Copy link
Collaborator

I shall make the fix, though, so that your original code works doesn't fail.

Looking into how xarray deals with this (which I haven't wholly understood, yet), it's probably not the 5 minute fix I dreamt of, but I'll keep at it ...

@bnlawrence
Copy link
Author

(Sorry, I was hoping that I would get benefit from the workers on the read, since the pp bit is slow)

@davidhassell
Copy link
Collaborator

OK - we can read PP/FF files in parallel, so if you did (ff[0] + 2).array the reads would be parallised over Dask chunks, but writing is limited to one Dask chunk at a time, and a Dask chunk equates to one 2-d UM field, and so no benefit from parallelism in the writing case :(

@davidhassell davidhassell added the dask Relating to the use of Dask label Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dask Relating to the use of Dask
Projects
None yet
Development

No branches or pull requests

2 participants