Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable opening datasets with gcps even if no valid crs is present #182

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

weiji14
Copy link

@weiji14 weiji14 commented Sep 23, 2022

Some datasets like Sentinel-1 GRD files don't have a coordinate reference system (crs), but they may have ground control points (gcps) and can still be opened by rasterio.

Test this branch using pip install git+https://github.com/weiji14/stackstac.git@open_vrt_with_gcps

Fixes #181.

Some datasets like Sentinel-1 GRD files don't have a coordinate reference system (crs), but they may have ground control points (gcps) and can still be opened by rasterio.
@weiji14 weiji14 marked this pull request as ready for review September 23, 2022 18:17
@weiji14
Copy link
Author

weiji14 commented Sep 23, 2022

Probably should write a regression test. @scottyhq, any good sample Sentinel-1 GRD files with a stable link that you know of?

Need to reproject from a proper source coordinate reference system (src_crs) instead of arbitrarily.
@scottyhq
Copy link
Contributor

Probably should write a regression test. @scottyhq, any good sample Sentinel-1 GRD files with a stable link that you know of?

GRDs are really big. xarray-sentinel has a minimized one here for testing https://github.com/bopen/xarray-sentinel/tree/main/tests/data/S1B_IW_GRDH_1SDV_20210401T052623_20210401T052648_026269_032297_ECC8.SAFE.

Or rioxarray just constructs a synthetic dataset with GCPs for a test: https://github.com/corteva/rioxarray/blob/bfd616594bb82886133b5f4b9356ed16d39014fc/test/integration/test_integration_rioxarray.py#L1053

Copy link
Owner

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for working on this @weiji14, and thanks for the test too!

stackstac/rio_reader.py Show resolved Hide resolved
Put the src_crs in the vrt_params dict to avoid an elif statement.
Comment on lines +50 to +52
np.testing.assert_allclose(
actual=array, desired=np.array([[0.0, 0.0], [0.0, 0.0]], dtype=np.float32)
)
Copy link
Author

@weiji14 weiji14 Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for working on this @weiji14, and thanks for the test too!

Don't thank me yet, this is a really bad test 😅 It only checks that things can run (and that a GeoTIFF with gcps works without an error message), but I don't like this assert statement... Any pointers?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe we could read it back using rasterio with (what we think are) the right parameters, and check that the results match?

Only other thing I can think of would be to write some data into the file (probably like np.linspace(...).reshape(...)). Then we could make a more meaningful assertion about the values we get out.

@gjoseph92
Copy link
Owner

@weiji14 did you get any further with this? Is this actually working as is now?

@weiji14
Copy link
Author

weiji14 commented Oct 24, 2022

@weiji14 did you get any further with this? Is this actually working as is now?

Technically, I think the implementation is sound. But the unit test could be much improved. Ok if you want to merge this first and improve the unit test at a later date.

It's not a good idea in dask to mutate inputs. This object could be shared with other tasks.

Also, I think we just need to check `ds.crs is None`—if `ds.crs` is a CRS, it will always have a `to_epsg` method. (Though that could fail, so I've added handling for that case too.)
Copy link
Owner

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiji14 I pushed a change here to avoid mutating spec.vrt_params, since that object could possibly be shared with other dask tasks.

To be clear, in your example in #181, you're still getting the purple/green/yellow output? That makes me think this might not be working correctly yet?

Comment on lines +50 to +52
np.testing.assert_allclose(
actual=array, desired=np.array([[0.0, 0.0], [0.0, 0.0]], dtype=np.float32)
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe we could read it back using rasterio with (what we think are) the right parameters, and check that the results match?

Only other thing I can think of would be to write some data into the file (probably like np.linspace(...).reshape(...)). Then we could make a more meaningful assertion about the values we get out.

The test is still failing for me though.
@weiji14
Copy link
Author

weiji14 commented Oct 24, 2022

@weiji14 I pushed a change here to avoid mutating spec.vrt_params, since that object could possibly be shared with other dask tasks.

I'm not seeing any new commits in this PR. Could you try again?

To be clear, in your example in #181, you're still getting the purple/green/yellow output? That makes me think this might not be working correctly yet?

If I recall correctly, there's actually 2 things needed to fix the purple/green/yellow output.

  1. Getting the image plotted at (roughly) the correct location, and this requires reading the coordinate reference system (CRS) and ground control points (GCP) which this Pull Request has fixed (at least I think it's implemented correctly).
  2. Resampling/Interpolation - this is something this PR won't be handling, but which the user would still need to set to get a proper Sentinel-1 GRD image plotted (that is not purple/green/yellow).

I'm having trouble with the second part (resampling), which would require someone a bit smarter on the SAR data structure to get right (e.g. @scottyhq). I wish I had a proper end-to-end validated example where someone has taken a Sentinel-1 GRD image (from Microsoft Planetary Computer), read it properly with rasterio (with some set of parameters) and plotted it, but I'm missing that picture right now.

this also gets the test passing
@gjoseph92
Copy link
Owner

Oops maybe I forgot to push. I see them now in https://github.com/gjoseph92/stackstac/pull/182/files/1b3c47c6bd0d3ec9dd17b25dc8cb1c8496a131a7..d7b08203ab754295bf5644fa0a661df9fa94e8b9.

I'd be a little surprised to find that it's just a resampling issue. As long as the data types are right and we're not truncating floats to ints or something, I don't quite see why a different resampling method would change meaningful data into something completely meaningless-looking.

Furthermore, if you look at the example scene from #181 in PC explorer, you can see that what we're seeing from stackstac.show also has a different spatial extent, as well as looking meaningless. I suppose that could be a STAC metadata issue like #152 though (I haven't looked at the metadata at all).

I guess where I'm leaning right now is that if we can't get this to seem to work properly and return meaningful data, I'd rather raise a NotImplementedError if ds.crs is None than try to use GCPs but possibly return bad values.

if self.spec.vrt_params != {
"crs": ds.crs.to_epsg(),
if ds_epsg is None or self.spec.vrt_params != {
"crs": ds_epsg,
"transform": ds.transform,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIght need to set src_transform in addition to transform as mentioned by @vincentsarago in #181 (comment)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looking at the code it might be a bit more complex.

What I do in rio-tiler, is automatically create a WarpedVRT when we have gcps (which in stackstac case should replace the ds) and then I create another WarpedVRT on top of it if I need warping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'NoneType' object has no attribute 'to_epsg'
4 participants