Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: cannot pickle '_io.BufferedReader' object when trying to modify an xarray.DataArray opened with fsspec's filecache #711

Open
abarciauskas-bgse opened this issue Nov 9, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@abarciauskas-bgse
Copy link

👋🏽 Hoping someone on the team can help us figure out how to use fsspec filecache with netcdf data when we need to modify the xarray data array object with rioxarray. Right now, it is impossible to do so as we are getting the _io.BufferedReader and the traceback led us to believe this has to do with the deep copy operation taking place https://github.com/corteva/rioxarray/blob/master/rioxarray/rioxarray.py#L1102 and

obj_copy = self._obj.copy(deep=True)

Code Sample

import fsspec
from morecantile import Tile
from rio_tiler.constants import WEB_MERCATOR_TMS
import numpy as np
import xarray as xr
import shutil
import pandas as pd

tms = WEB_MERCATOR_TMS
tile_bounds = tms.xy_bounds(Tile(x=0, y=0, z=0))
dst_crs = tms.rasterio_crs

protocol = 's3'
file_url = 's3://chunk-tests/3B42_Daily.19980101.7.nc4'
cache_storage_dir = 'fsspec-cache'

cache_options = ['filecache', 'blockcache']
inplace_options = [True, False]
# We can add `True` to this list, but `True` always returns AttributeError: __enter__ 
lock_options = [False]

xr_args = {
    'engine': 'h5netcdf'
}

def rio_clip_box(da):
    try:
        crs = da.rio.crs or "epsg:4326"
        da.rio.write_crs(crs, inplace=True)   
        # also with no data     
        da = da.rio.clip_box(*tile_bounds, crs=dst_crs)
    except Exception as e:
        return f"❌ {type(e).__name__}: {e}".replace('\n', ' ')
    return '✅'

def rio_write_nodata(da, inplace: bool = True):
    try:
        da.rio.write_nodata(np.nan, inplace=inplace)
    except Exception as e:
        return f"❌ {type(e).__name__}: {e}".replace('\n', ' ')
    return '✅'

columns = ('cache_option', 'inplace_option', 'lock_option', 'clip_box', 'write_nodata')
results = []
for cache_option in cache_options:
    for inplace_option in inplace_options:
        for lock_option in lock_options:
            params = (cache_option, inplace_option, lock_option)
            filecache_fs = fsspec.filesystem(cache_option, target_protocol=protocol, cache_storage=cache_storage_dir)
            file_opener = filecache_fs.open(file_url, mode='rb')
            xr_args['lock'] = lock_option
            try:
                ds = xr.open_dataset(file_opener, **xr_args)
            except Exception as e:
                results.append(params + (f"❌ {type(e).__name__}: {e}".replace('\n', ' '), f"❌ {type(e).__name__}: {e}".replace('\n', ' ')))
                continue
            da = ds['precipitation']
            da = da.rename({'lon': 'x', 'lat': 'y'})
            da = da.transpose("time", "y", "x", missing_dims="ignore")
            rio_write_nodata_result = rio_write_nodata(da, inplace=inplace_option)
            clip_box_result = rio_clip_box(da)
            results.append(params + (clip_box_result, rio_write_nodata_result))
            shutil.rmtree(cache_storage_dir)

df = pd.DataFrame(data=results, columns=columns)
df.to_markdown("results.md", index=False, tablefmt="github")
cache_option inplace_option lock_option clip_box write_nodata
filecache True False ❌ TypeError: cannot pickle '_io.BufferedReader' object
filecache False False ❌ TypeError: cannot pickle '_io.BufferedReader' object ❌ TypeError: cannot pickle '_io.BufferedReader' object
blockcache True False
blockcache False False

Problem description

It is not possible to make rioxarray operations on an xarray.DataArray that is stored in fsspec's filecache

Expected Output

Modified xarray.DataArray

Environment Information

python -c "import rioxarray; rioxarray.show_versions()"

returns

rioxarray (0.15.0) deps:
  rasterio: 1.3.8
    xarray: 2023.10.0
      GDAL: 3.6.4
      GEOS: 0.0.0
      PROJ: 9.0.1
 PROJ DATA: /Users/aimeebarciauskas/mambaforge/share/proj
 GDAL DATA: /Users/aimeebarciauskas/mambaforge/share/gdal

Other python deps:
     scipy: None
    pyproj: 3.6.0

System:
    python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:39:40) [Clang 15.0.7 ]
executable: /Users/aimeebarciauskas/mambaforge/bin/python
   machine: macOS-10.15.7-x86_64-i386-64bit
python -c "import fsspec; print(fsspec.__version__)"

returns

2023.9.0

Installation method

pip

@abarciauskas-bgse abarciauskas-bgse added the bug Something isn't working label Nov 9, 2023
@abarciauskas-bgse
Copy link
Author

It might be worth noting that if you don't remove the cache after each run of the 2 functions you get all instances of ❌ TypeError: cannot pickle '_io.BufferedReader' object for clip_box and for write_nodata when inplace=False. So rioxarray is not able to work with fsspec's blockcache for files either.

@snowman2
Copy link
Member

Related #614. Possible duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants