Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible to inadvertently drop item when proj:bbox is much larger than target bbox in another projection #205

Open
hrodmn opened this issue Jun 23, 2023 · 2 comments

Comments

@hrodmn
Copy link

hrodmn commented Jun 23, 2023

TLDR: Would it be possible to filter STAC items using proj:geometry instead of proj:bbox in prepare_items? This would make it possible to use more than four reprojected polygon coordinates to check overlap between the target bbox and the STAC items.

I have a STAC collection where the items/assets are stored in 15 degree squares in epsg:4326. This might be a terrible storage scheme, but it creates a situation where data gets inadvertently dropped when loading it with stackstac.stack. I can make a reprex if it would be useful but for now I will just start with a description of the problem.

My AOI is fairly small and is defined in epsg:5070. This AOI happens to straddle the 45th parallel, so it is on the border between two of my STAC items. When I load the data via stackstac, one of the items gets dropped presumably because, after reprojecting to epsg:5070, the item proj:bbox does not overlap the AOI's bbox.

Here is a python script that illustrates the situation:

import matplotlib.pyplot as plt

from pyproj import Transformer
from shapely.geometry import box, shape
from shapely.ops import transform

# a small AOI in Minnesota
aoi = box(277615, 2449019, 295781, 2460086)

# geometry from STAC item metadata from two rasters (epsg:4326)
item_geom_1 = shape(
    {
        "type": "Polygon",
        "coordinates": [
            [
                [-105.0, 45.0],
                [-90.0, 45.0],
                [-90.0, 60.0],
                [-105.0, 60.0],
                [-105.0, 45.0],
            ]
        ],
    }
)

item_geom_2 = shape(
    {
        "type": "Polygon",
        "coordinates": [
            [
                [-105.0, 30.0],
                [-90.0, 30.0],
                [-90.0, 45.0],
                [-105.0, 45.0],
                [-105.0, 30.0],
            ]
        ],
    }
)

# transform the item geometries to epsg:5070
transformer = Transformer.from_crs("epsg:4326", "epsg:5070", always_xy=True)

item_geom_1_5070 = transform(transformer.transform, item_geom_1)
item_geom_2_5070 = transform(transformer.transform, item_geom_2)

# aoi should intersect the first STAC item but it doesn't!
aoi.intersects(item_geom_1_5070)

# make a plot
view_bbox = (176409, 2370653, 414088, 2515452)
plt.fill(*item_geom_1_5070.exterior.xy, color="purple")
plt.fill(*item_geom_2_5070.exterior.xy, color="blue")
plt.plot(*aoi.exterior.xy, color="green")
plt.xlim([view_bbox[0], view_bbox[2]])
plt.ylim([view_bbox[1], view_bbox[3]])
plt.savefig("/tmp/5070.png")

5070

After reprojection from epsg:4326 to epsg:5070, the boundary between the STAC item geometries has shifted north of the 45th parallel!

If I reproject the aoi to epsg:4326, everything looks as expected!

# reproject aoi to epsg:4326
transformer = Transformer.from_crs("epsg:5070", "epsg:4326", always_xy=True)
aoi_4326 = transform(transformer.transform, aoi)

view_bbox = (-93.393, 44.460, -90.965, 45.549)
plt.fill(*item_geom_1.exterior.xy, color="purple")
plt.fill(*item_geom_2.exterior.xy, color="blue")
plt.plot(*aoi_4326.exterior.xy, color="green")
plt.xlim([view_bbox[0], view_bbox[2]])
plt.ylim([view_bbox[1], view_bbox[3]])

plt.savefig("/tmp/4326.png")

4326

@pjhartzell recently wrote an article about this exact problem and shared how the stactools package addresses it by densifying the STAC item geometry. I tried that and it seems to fix my issue. Now the STAC item geometries still track the 45th parallel after being reprojected.

import numpy
from shapely.geometry import mapping

# Adapted from https://github.com/developmentseed/rio-stac/blob/main/rio_stac/stac.py#L57-L69
def densify_geom(geom, densify_pts: int):
    # Derived from code found at
    # https://stackoverflow.com/questions/64995977/generating-equidistance-points-along-the-boundary-of-a-polygon-but-cw-ccw
    coordinates = numpy.asarray(geom["coordinates"][0])

    densified_number = len(coordinates) * densify_pts
    existing_indices = numpy.arange(0, densified_number, densify_pts)
    interp_indices = numpy.arange(existing_indices[-1] + 1)
    interp_x = numpy.interp(interp_indices, existing_indices, coordinates[:, 0])
    interp_y = numpy.interp(interp_indices, existing_indices, coordinates[:, 1])
    return shape(
        {
            "type": "Polygon",
            "coordinates": [[(x, y) for x, y in zip(interp_x, interp_y)]],
        }
    )


item_geom_1_dense = densify_geom(mapping(item_geom_1), 10)
item_geom_2_dense = densify_geom(mapping(item_geom_2), 10)

# transform the item geometry to epsg:5070
transformer = Transformer.from_crs("epsg:4326", "epsg:5070", always_xy=True)

item_geom_1_dense_5070 = transform(transformer.transform, item_geom_1_dense)
item_geom_2_dense_5070 = transform(transformer.transform, item_geom_2_dense)

view_bbox = (176409, 2370653, 414088, 2515452)
plt.fill(*item_geom_1_dense_5070.exterior.xy, color="purple")
plt.fill(*item_geom_2_dense_5070.exterior.xy, color="blue")
plt.plot(*aoi.exterior.xy, color="green")
plt.xlim([view_bbox[0], view_bbox[2]])
plt.ylim([view_bbox[1], view_bbox[3]])

plt.savefig("/tmp/5070_densified.png")

5070_densified

If stackstac were to read proj:geometry instead of proj:bbox to check for overlap between items and the target bbox we could avoid this edge case!

@gjoseph92
Copy link
Owner

gjoseph92 commented Jul 19, 2023

Thanks for reporting @hrodmn, and thanks for the nice explanation. I think there are 2 things in this issue:

  1. stackstac is getting the bounds of the assets slightly wrong, because it's naively reprojecting the bounds without densification. See this lovely TODO:
    # TODO: use `rio.warp.transform_bounds` instead, for densification? or add our own densification?

This definitely needs to be fixed! Just densifying the bounding-box before reprojecting it should solve the issue you're facing here.

  1. If stackstac were to read proj:geometry instead of proj:bbox to check for overlap between items and the target bbox we could avoid this edge case!

    This is interesting, because not using geometry to calculate overlap was a somewhat intentional choice. My reasoning was that the geometry field didn't feel as precise as things like proj:bbox or proj:transform + proj:shape, since geometry is always lat-lon, whereas the other fields are in the same projection as the data and also seem more strictly defined. My perception a couple years ago, when I wrote all this, was that geometry was more for display purposes but might not always be pixel-level accurate.

    Indeed, in some cases we can see STAC data in the wild where the geometry field doesn't do densification either. For example, look at MODIS data (like mentioned in the stactools blog post) on Planetary Computer:
    Screen Shot 2023-07-19 at 3 16 36 PM

    The blue path (geometry) doesn't match the actual data. If stackstac just trusted the geometry field, we'd get the same error you're mentioning here.

    Now, that stactools post is really interesting though, because it includes reading the underlying GeoTIFF and clipping the geometry vector to where there's actually raster data. This is super valuable, because it adds additional information to the metadata that you don't otherwise have, and would let us better filter out unnecessary items.

    If that approach becomes widely used, that would be a very compelling reason for stackstac to filter on geometry. Of course, it also has the downside that mis-defined geometries (like shown above) would then cause issues.

    I could see even using a hacky heuristic, that if the geometry contains >4 points, trust it since it provides additional information, otherwise just use proj:bbox since it's slightly more accurate.

Anyway, what to do:

  1. densify bounding boxes before reprojecting them: we should fix this now; I'm working on a PR.
  2. filter for overlap with AOI using geometry instead of bbox: this is a bigger code change, introduces the potential for bad geometries to cause a bad experience, and only adds value over densification if providers are using geometry as a vector nodata mask. I'm very into it if providers start doing that, but probably won't work on that right now.

@gjoseph92
Copy link
Owner

Actually, I realize that there's a much simpler argument for filtering by geometry:

Screen Shot 2023-07-19 at 3 16 36 PM

Say you're loading this MODIS data into EPSG:4326. Your AOI, in lat-lon, does not actually overlap with the footprint of the asset. But the bounding box of the asset, in lat-lon, includes a ton of empty space and does overlap with your AOI.

If we filter by "bounding boxes intersect" (current behavior), you'll still load this scene, and it will be all empty. If we switched to filtering by geometry intersection—whether that geometry is actually the geometry field, or a polygon we generate ourselves from the densified proj:bbox, etc.—we'd be able to drop this unnecessary item. That's pretty compelling.

Still don't think I'll do this immediately, but it does seem worth doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants