Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DataArray objects read through rioxarray #445

Open
darribas opened this issue Jan 19, 2022 · 2 comments · May be fixed by #455
Open

Add support for DataArray objects read through rioxarray #445

darribas opened this issue Jan 19, 2022 · 2 comments · May be fixed by #455

Comments

@darribas
Copy link
Member

The raster functionality in for weights building currently supports DataArray objects created through xarray.open_rasterio:

import xarray
from libpysal.weights import Queen
pop = xarray.open_rasterio(
               'https://geographicdata.science/book/_downloads/5263090bd0bdbd7d1635505ff7d36d04/ghsl_sao_paulo.tif'
)
w = Queen.from_xarray(pop)

The code above correctly picks up that missing data are specified with -200 and removes them from the calculation of the weights:

w.n
> 97232

This is not the case when we use DataArray objects build through the (generally recommended) rioxarray.open_rasterio:

import rioxarray
pop2  = rioxarray.open_rasterio(
               'https://geographicdata.science/book/_downloads/5263090bd0bdbd7d1635505ff7d36d04/ghsl_sao_paulo.tif'
)
w2 = Queen.from_xarray(pop2)

which does not pick up the missing data:

w.n
> 194688

My hunch is that this is because the builder picks up missing values only through the attrs object:

if "nodatavals" in da.attrs and da.attrs["nodatavals"]:
mask = (ser != da.attrs["nodatavals"][0]).to_numpy()
ids = np.where(mask)[0]
id_map = _idmap(ids, mask, dtype)
ser = ser[ser != da.attrs["nodatavals"][0]]

But rioxarray does not populate that general object, instead logs missing data under pop2.rio.nodata:

pop.attrs
> {'transform': (250.0, 0.0, -4482000.0, 0.0, -250.0, -2822000.0),
 'crs': '+proj=moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs=True',
 'res': (250.0, 250.0),
 'is_tiled': 0,
 'nodatavals': (-200.0,),
 'scales': (1.0,),
 'offsets': (0.0,),
 'AREA_OR_POINT': 'Area',
 'grid_mapping': 'spatial_ref'}

pop2.attrs
> {'_FillValue': -200.0, 'scale_factor': 1.0, 'add_offset': 0.0}

pop2.rio.nodata
> -200.0

I think we should support both formats, read through vanilla xarray and with rioxarray loaded in the session too. The former is a more generic approach, the latter is a more robust one for rasters.

Not sure what'd be the ideal way of supporting this. I'm cc'ing here @MgeeeeK since he worked on the original implementation and might have some ideas, and @snowman2 as the main driver behind rioxarray in case they have some suggestions in terms of best practices.

@snowman2
Copy link

This is a helpful reference: https://corteva.github.io/rioxarray/stable/getting_started/nodata_management.html

@darribas darribas linked a pull request Jan 29, 2022 that will close this issue
5 tasks
@darribas
Copy link
Member Author

I've issued a PR (#455 ) that I think should have a more comprehensive approach following @snowman2 suggestion from rioxarray. If tests pass, I think it should be good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants