Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stackstac.stack provides shifted pixel coordinnates compare to qgis/rioxarray #207

Open
Berhinj opened this issue Jul 17, 2023 · 5 comments

Comments

@Berhinj
Copy link

Berhinj commented Jul 17, 2023

Default stackstac.stack parameters are (in my experience) for all stac catalogs I tried providing incorrect pixel coordinates.

First here is an example showing up the shift created by stackstac.stack default parameters, should we default 'snap_bounds' to False & 'xy_coords' to "center" instead?

Let's gather some data

import stackstac
import pystac_client
import rasterio
import rioxarray as rxr

utm = 32615
bbox = [-90.47116362,  33.50472389, -90.37408012,  33.54915847] # wgs84
buffer = 10
smaller_bbox = [729975-buffer, 3770025-buffer, 729975+buffer, 3770025+buffer] # local utm

params = dict(
    collections=["sentinel-2-l2a"],
    bbox=bbox,
    datetime='2023-06-01/2023-06-15',
    query={
            # "eo:cloud_cover": {"lt": 20},
            # "landsat:collection_category": {"eq": "T1"},
            # "landsat:collection_number": {"eq": "02"},
            # "platform": {"in": ["LANDSAT_8", "LANDSAT_9"]},
            # "landsat:correction": {"eq": "L2SP"},
        },
)
href = "https://earth-search.aws.element84.com/v1"
c = pystac_client.Client.open(href)

s = c.search(**params)
_ = s.matched()
xs = s.item_collection()

Then do some plots

img_rxr = rxr.open_rasterio(xs[0].assets["red"].href).squeeze()
img_rxr.rio.clip_box(*smaller_bbox).plot()

Screenshot 2023-07-17 at 16 38 01

stac_img = stackstac.stack(xs, assets=["red"])\
    .sel(time = str(xs[0].properties["datetime"])[:-2])\
    .squeeze()
stac_img.rio.clip_box(*smaller_bbox).plot.imshow()

Screenshot 2023-07-17 at 16 39 13

stac_img = stackstac.stack(xs, assets=["red"], snap_bounds=False, xy_coords="center", )\
    .sel(time = str(xs[0].properties["datetime"])[:-2])\
    .squeeze()
stac_img.rio.clip_box(*smaller_bbox).plot.imshow()

Screenshot 2023-07-17 at 16 38 56

Please confirm it's not only me. It seems like the convention default is confusing right?

@gjoseph92
Copy link
Owner

gjoseph92 commented Jul 17, 2023

It seems like the convention default is confusing right?

There's not a single convention for whether coordinates should refer to the center of a pixel versus to its top-left corner (or even somewhere else): https://gis.stackexchange.com/questions/122670/is-there-a-standard-for-the-coordinates-of-pixels-in-georeferenced-rasters

  • GDAL uses the top-left corner
  • I can't find any documentation that QGIS coordinates refer to pixel centers, as you're suggesting. Since QGIS is basically a big GUI around GDAL, I'd assume that its coordinates also refer to top-left corners.
  • Rasterio uses the top-left corner, since it wraps GDAL.
  • The default for GeoTIFFs is "PixelIsArea", "with (0,0) denoting the upper-left corner of the image".
  • ArcGIS maybe uses pixel centers? The location of the dots in this diagram looks like pixel centers, but I can't find it stated clearly anywhere.

In my personal experience with geospatial tools, top-left-corner is far more common/conventional, but I'm sure it's different for others depending on what tools you're used to.

To me, rioxarray is the notable exception in the Python geospatial world, in that it intentionally shifts the coordinates by a half-pixel offset to align with pixel centers. My understanding is that this is meant to match xarray's model, where coordinates often refer to the center of pixels (I can't find a link to this in the docs though). However, I also understood that this convention came from xarray's origins working with weather and climate data (where I think "pixel is point" is more common, which expects the half-pixel offset), not that it's a requirement for representing coordinates in xarray.

Anyway, I chose to make top-left corner the default for stackstac because it seemed more consistent to me with most geospatial conventions broadly, rather matching one particular library (rioxarray) but introducing confusion for everyone else. But the xy_coords="center" option exists for this reason, so if you want to use stackstac along with rioxarray, I'd suggest using that.

As for snap_bounds=True, my logic was that partially-sized pixels make analysis more difficult and introduce a possible source of error if you're doing any sort of spatial aggregation (how much land area is deforested, what's the average NDVI, etc.). So a reasonable default of True protects you from these sorts of subtle mistakes—plus makes it easer to align multiple datasets, by just matching resolution and CRS. If you're aware of the nuances of this, and want unaligned pixels, you can pass False.


Ultimately both of these are just the defaults that I picked, thinking they'd be the least confusing (for reasons explained above). I'd be very happy to hear others weigh in on whether these defaults also seem good to them.

As for actually changing them, that would be a subtle and significant breaking change for most stackstac users, so I'd definitely be conservative about making any changes. But if there are good reasons to change either of them, it still would be worth doing.

@Berhinj
Copy link
Author

Berhinj commented Jul 18, 2023

First, thanks a lot for your support, that's amazing, plus this package in general is making my life so much easier.

To be honest I thought using stackstac with rioxarray is what most of us do but I might be taking false assumptions. If it is the case I would say xy_coords="center" would be more suited as default, all scripts I've seen have missed this specificity and it could create issue down the line with cliping for example.

Regarding snap bounds, I love the functionnality, but in the context of sentinel 2 above it shifts the whole scene by a pixel. Which also can be confusing for most users and again I haven't seen much script online editing such value so people might also be getting surprised.

In my case, I'm happy I'm aware, and I hope having this issue created might help people facing similar issue. You definitively have a better view on if it should be changed or not.

@Berhinj Berhinj closed this as completed Jul 18, 2023
@Kirill888
Copy link

@gjoseph92 I would like to disagree with your assessment regarding lack of convention for coordinates in xarray.

Here is a simple notebook example you can try out:

import xarray as xr
import numpy as np
mode = "center"
N = 3
X = Y = np.linspace(0, N-1, N)
if mode == "center":
    X = X + 0.5
    Y = Y + 0.5
pix = np.zeros((len(Y), len(X)))
pix[1, 1] = 1
xr.DataArray(pix, coords={'x': X, 'y': Y}).plot.imshow(size=2);

image

But without adding 0.5, i.e. using left edge coordinates instead of pixel center, you get an image like this:

image

The coordinates returned by rioxarray and odc-stac are not like that because of GDAL, they are like that because that's what xarray library expects to see.

@gjoseph92
Copy link
Owner

Thanks for jumping in here @Kirill888. That's a good point.

Unfortunately, it looks like indexing also assumes pixel-center coordinates:

import xarray as xr
import numpy as np

N = 3
X = Y = np.linspace(0, N-1, N)
pix = np.zeros((len(Y), len(X)))
pix[1, 1] = 1
arr_tl = xr.DataArray(pix, coords={'y': Y, 'x': X})
arr_c = xr.DataArray(pix, coords={'y': Y + 0.5, 'x': X + 0.5})
arr_c.sel(x=0.9, y=1, method='nearest')
# <xarray.DataArray ()>
# array(0.)
# Coordinates:
#     y        float64 1.5
#     x        float64 0.5

arr_tl.sel(x=0.9, y=1, method='nearest')
# <xarray.DataArray ()>
# array(1.)
# Coordinates:
#     y        float64 1.0
#     x        float64 1.0

You'd expect (0.9, 1) to fall within a 0 pixel.

That's a strong argument for changing the default.

One place I can think of so far as being confusing is when you're getting coordinates back out of xarray. For example, maybe you're interested in the points with NDVI greater than some threshold to pass into some other analysis.

flat = arr_c.stack(px=["x", "y"], create_index=False)
flat[flat > 0].to_dataframe(name='value')
#       y    x  value
# px                 
# 0   1.5  1.5    1.0

Using pixel-center coordinates gives you, of course, the coordinates of the pixel centers. If you're not aware of this xarray convention and are used to conventions in other geospatial tools, or want to pass these coordinates into another tool, you might end up with a half-pixel offset. Top-left labeled coordinates avoids this.

flat = arr_tl.stack(px=["x", "y"], create_index=False)
print(flat[flat > 0].to_dataframe(name='value'))
#       y    x  value
# px                 
# 0   1.0  1.0    1.0

I think the indexing issue might outweigh getting coordinates out, though.

@gjoseph92 gjoseph92 reopened this Jul 23, 2023
@Kirill888
Copy link

Unfortunately, it looks like indexing also assumes pixel-center coordinates:

looks like xarray is consistent with pixel coordinate notation 🥳

One place I can think of so far as being confusing is when you're getting coordinates back out of xarray. For example, maybe you're interested in the points with NDVI greater than some threshold to pass into some other analysis.

It's a common problem when dealing with image analysis in general. Image can be viewed as a multi-dimensional array (usually 2 or 3 dims), im[iy, ix], where ix,iy are non-negative integers. Or it can be seen as a digitized sampling of some continuous 2d function: f(x, y). Where x, y are floating point numbers, even when they refer to image coordinates directly and not to some world coordinates. I believe this old issue in OpenCV library illustrates the confusion that this duality of interpretation causes.

It's not an easy problem to resolve when you have a lot of code that assumes certain convention, possibly different in different places, in a library with a long history, my request to document pixel coordinate convention remains open...

I personally feel that "top left corner of the top left pixel is 0,0" is a much more common convention than "pixel center of the top-left pixel is placed at 0,0". And it is so for a good reason - image plane spanning from 0,0 to W,H is more intuitive than -1/2, -1/2 to W-1/2, H-1/2. And if one wants to go from continuous pixel coordinates to array index it's just a matter of using ix = int(floor(x)) (assuming 0-based indexing here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants