Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rioxarray.write_transform() not changing transform #698

Open
nilsleh opened this issue Sep 13, 2023 · 8 comments
Open

rioxarray.write_transform() not changing transform #698

nilsleh opened this issue Sep 13, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@nilsleh
Copy link

nilsleh commented Sep 13, 2023

Problem description

I am working with CMIP6 data and looking at utilizing rioxarray for possibly combining different data streams, so I am looking for a general approach through rioxarray. For this am using the merge_arrays() function, with the hope of applying it to not just for these specific CMIP6 files. However, for these files I am encountering rasterio.errors.WindowError: Bounds and transform are inconsistent Errors which seem to be due to a wrong transformation being read from the src files. The output_transform is computed correctly and works, and therefore I was thinking about setting the transform beforehand to the array. However, write_transform() does not change the transformation in this case, so I am wondering what I am missing or if there is another way to go about this. Data for the following example can be found here.

Code Sample

import xarray as xr
from rasterio.transform import Affine
from rioxarray.merge import merge_arrays

output_transform = Affine(1.0, 0.0, -0.5, 0.0, -1.0, 329.5)

arr_one = xr.open_dataset("CanESM5_r1i1p1f1_tos.nc")["tos"].squeeze()
arr_one.rio.set_spatial_dims("i", "j", inplace=True)

print(f"Before: {arr_one.rio.transform()}")
arr_one.rio.write_transform(output_transform, inplace=True)
print(f"After: {arr_one.rio.transform()}")

arr_two = xr.open_dataset("CanESM5_r1i1p1f1_zos.nc")["zos"].squeeze()
arr_two.rio.set_spatial_dims("i", "j", inplace=True)
arr_two.rio.write_transform(output_transform, inplace=True)

output = merge_arrays([arr_one, arr_two])

Environment Information

rioxarray - 0.15.0
rasterio - 1.3.6
xarray - 2023.6.0

@nilsleh nilsleh added the bug Something isn't working label Sep 13, 2023
@adamjstewart
Copy link

@snowman2 can you comment on this issue? We would like to add support for HDF5/NetCDF to TorchGeo and rioxarray seems to be the correct way to simulate a rasterio file but we're stuck on the above example. We're not sure if we're doing something wrong or this is a bug in rioxarray.

@snowman2
Copy link
Member

snowman2 commented Nov 9, 2023

Inspecting the file:
image

It appears that the dimensions i and j don't contain the geospatial coordinate information and instead contain the indexes of the grid. This is problematic as those coordinates are used to re-calculate the resolution/transform when the exist even if the transform has been written as it is usually more reliable (this happens quite a bit in the rioxarray code). For best results, I recommend replacing those values with the correct x & y geospatial coordinate arrays.

@snowman2
Copy link
Member

snowman2 commented Nov 9, 2023

Side note: rio.write_transform is helpful to add the transform when the coordinate arrays do not exist (to save disk space).

@snowman2
Copy link
Member

snowman2 commented Nov 9, 2023

image

@snowman2
Copy link
Member

snowman2 commented Nov 9, 2023

This is how rioxarray adds coordinates based on the transform:

rioxarray/rioxarray/_io.py

Lines 1203 to 1206 in c15b860

if parse_coordinates:
coords.update(
_generate_spatial_coords(riods.transform, riods.width, riods.height)
)

@nilsleh
Copy link
Author

nilsleh commented Nov 13, 2023

Thanks for the suggestion. For the zip file previously attached, the case seems to be that the defined coordinate grid from longitude(i,j) is not equally spaced for the coordinates. For example, inspecting arr_one.longitude.data[:,0]. Thus I am not sure how replacing the i, j coordinates with an appropriate geospatial coordinate 1d arrays would go and it seems that that is a separate issue.

So apart from that I included some additional data here where the coordinate issue is not of concern, which might help to get more closely to understanding the actual transform error rasterio.errors.WindowError: Bounds and transform are inconsistent.

As background information, for the torchgeo dataset implementation we are trying to achieve the following:

  • support different data streams or sources that are .nc or .hdf5 file (i.e. CMIP data etc.)
  • for a particular region of interest or bounding box, extract the respective area for each data variable of interest and combine the array by merging them spatially
import xarray as xr
from rioxarray.merge import merge_arrays


query = {
    "minx": 0,
    "miny": 0,
    "maxx": 50,
    "maxy": 50
}

_crs = "EPSG:4326"
spatial_x_name = "longitude"
spatial_y_name = "latitude"

data_variables = ["sst", "sla"]

alt_ds= xr.open_dataset("/eo/dt_global_twosat_phy_l4_199311_vDT2021-M01.nc")
alt_ds = alt_ds.assign_coords(longitude=(alt_ds.longitude % 360 - 180)).roll(
        longitude=(alt_ds.dims["longitude"] // 2)
    )
sst_ds = xr.open_dataset("/eo/HadISST1_SST_update.nc")
sst_ds = sst_ds.sortby("latitude", ascending=True)

data_arrays = []
for ds in [sst_ds, alt_ds]:
    ds.rio.set_spatial_dims(spatial_x_name, spatial_y_name, inplace=True)

    for variable in data_variables:
        if hasattr(ds, variable):
            da = ds[variable]
            if not da.rio.crs:
                da.rio.write_crs(_crs, inplace=True)
            elif da.rio.crs != _crs:
                da = da.rio.reproject(_crs)
            # clip box ignores time dimension
            clipped = da.rio.clip_box(
                minx=query["minx"], miny=query["miny"], maxx=query["maxx"], maxy=query["maxy"]
            )
            # rioxarray expects this order
            clipped = clipped.transpose("time", spatial_y_name, spatial_x_name, ...)

            data_arrays.append(clipped.squeeze())


merged_data = merge_arrays(
    data_arrays, bounds=(query["minx"], query["miny"], query["maxx"], query["maxy"])
).data

Running that snippet I get:

ile "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rioxarray/merge.py", line 175, in merge_arrays
    merged_data, merged_transform = _rio_merge(
  File "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rasterio/merge.py", line 341, in merge
    src_window = windows.from_bounds(int_w, int_s, int_e, int_n, src.transform)
  File "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rasterio/windows.py", line 323, in from_bounds
    raise WindowError("Bounds and transform are inconsistent")
rasterio.errors.WindowError: Bounds and transform are inconsistent

So I think there is a mismatch between how the bounds and transforms are defined vs how they are expected in rioxarray and I am not sure where exactly this difference is.

@snowman2
Copy link
Member

#209 may be helpful.

@adamjstewart
Copy link

@snowman2 thanks for the hint! We'll see if that addresses the coordinate issue in the first paragraph of #698 (comment). Do you have any thoughts on the other paragraphs of that comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants