Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstruct CRS from WKT only #461

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gerritholl
Copy link
Collaborator

Load an AreaDefinition from a CF NetCDF file when the grid mapping variable is encoding only the Coordinate Reference System (CRS) in Well Known Text (WKT), but lacks other metadata.

Work in progress.

Add a test for loading an area from a CF NetCDF file encoded only with a
CRS WKT.  The test currently fails as I did not yet implement anything.
@gerritholl gerritholl self-assigned this Oct 13, 2022
Refactor out _get_type_of_grid_mapping in utils/cf.py in preparation for
getting the grid mapping from the WKT / CRS directly.  Tolerate absence
of grid mapping name if there is a crs_wkt attribute.
@ghiggi
Copy link
Contributor

ghiggi commented Oct 14, 2022

@gerritholl pin me when this is merged or mostly done so that I can test it on some of my use cases.
In meantime, I am drafting a PR refactoring the CFWriter internals to then create:

  • a CF-Compliant xr.Dataset (cleaner than the current scn.to_xarray_dataset())
  • a CF-Compliant ZarrWriter
  • a xr.Dataset satpy accessor to generate satpy scene from xr.Dataset.

I will wait your PR before proceeding ....

@ghiggi
Copy link
Contributor

ghiggi commented Oct 14, 2022

Side note:
If the Satpy Scene is on a SwathDefinition and we use the CFWriter, no grid_mapping attribute nor crs coords information is attached to netCDF. Maybe you can enforce the correct behavior in the context of this PR.
Going further away, when resampling a Scene from AreaDef to a SwathDef, the grid_mapping attribute should also be updated ...

@gerritholl
Copy link
Collaborator Author

@ghiggi I am putting the work for this PR on hold. For what I need to achieve, I can and will use pytroll/satpy#2236 instead. Therefore, fixing pytroll/satpy#2226 for the equirectangular projection is no longer a priority from my side.

@TomLav
Copy link
Contributor

TomLav commented Sep 28, 2023

I see that this PR seems stalled. If someone starts working on this again later, I want to note that the CF convention (CF-1.10) clearly states that:

The crs_wkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. If data producers omit the single-property grid mapping attributes in favour of the crs_wkt attribute, software which cannot interpret crs_wkt will be unable to use the grid_mapping information. Therefore the CRS should be described as thoroughly as possible with the single-property grid mapping attributes as well as by crs_wkt.

Thus, if we should load a CRS from wkt only, it should not be done silently by from_cf() or load_cf_area(). I think in the case described above, the user should actively (e.g. using a keyword) bypass decoding the single-property CF attributes, and rather use the WKT encoding.

@djhoese
Copy link
Member

djhoese commented Sep 28, 2023

@TomLav I disagree with this.

If data producers omit the single-property grid mapping attributes in favour of the crs_wkt attribute

This is specific to data producers, not consumers. We are consumers in this case. If the WKT does not match the single-property CF attributes then that is a fault of the data producer or of the CF format as a whole. Edit: We are allowed to just use the WKT attribute as consumers.

I think the above point is pretty clear, but I'd also like to add that I strongly disagree with CF's choice of making these separate CF attributes and for making crs_wkt optional and supplemental. WKT is fully descriptive and the CF attributes are not. The CF attributes (and the CF standard) do not support every projection and the standard as a whole has to be updated to support new projections when they come along. Conversations in the xarray ecosystem and the OGC Zarr world have started leaning towards the idea of being CF-compatible-ish because of this and only using the CRS WKT attribute in their zarr standards.

@TomLav
Copy link
Contributor

TomLav commented Sep 28, 2023

Hei @djhoese, I am not unaware of the history and tensions between the different communities around the CF attributes and the WKT definitions. Maybe these things will converge with time.

I was not saying that our from_cf() should bark and not load the crs from WKT, but I do think that this should not happen silently either. I do not know if this should be controlled via a keyword or a warning, but I think there is value for a routine called from_cf() to have the option to strictly follow the CF convention. We can e.g. have a from_cf(strict=True|False).

This being said pyresample load_cf_area() only checks if the :grid_mapping_name attribute is present and has a valid value, it then diverts to pyproj.from_cf() the decoding of the crs attributes and/or wkt. If pyproj adopts a loose approach, pyresample will inherit it silently.

You also state that you disagree with a quote that I took from the CF convention, which is fine, but there is little point discussing the CF convention here.

@djhoese
Copy link
Member

djhoese commented Sep 28, 2023

You also state that you disagree with a quote that I took from the CF convention, which is fine, but there is little point discussing the CF convention here.

No, sorry. I disagree with your interpretation of the quote. The standard is saying that we can load from WKT without looking at the single-property attributes (as consumers of the data, not producers).

but I do think that this should not happen silently either.

Why not? The CF convention gives us the crs_wkt attribute as an option. We can use it if it is there. There is no need to warn the user if we use it or don't use it. It is there to be used (and is more accurate).

to have the option to strictly follow the CF convention

We are following the CF convention by being a consumer of the data and using the crs_wkt attribute. There is nothing against the CF convention by doing this.

@TomLav
Copy link
Contributor

TomLav commented Sep 28, 2023

I see. We disagree on the interpretation of the CF convention.

You understand from the convention that files where the CRS is only specified with wkt are CF compliant.

I understand the convention that the CRS must first be defined in their single-attribute form, then we can duplicate and enrich with WKT.

For reference, my interpretation stems from the sentence: The crs_wkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. What follows with data producers does not change my interpretation of this first sentence.

I don't think we can settle this from Pytroll, this must have been discussed somewhere at CF, I will give this a look.

For the record: if it is so that CF allows a crs to only be defined by its WKT then of course from_cf() should silently do its job and load this. If CF does not allow this, I still think we should help the user and load the CRS, but we should also give the user the control to use non-strict checks. Do you agree?

@djhoese
Copy link
Member

djhoese commented Sep 28, 2023

You understand from the convention that files where the CRS is only specified with wkt are CF compliant.

I understand the convention that the CRS must first be defined in their single-attribute form, then we can duplicate and enrich with WKT.

Nope. I understand that PRODUCERS must always include the single-property attributes. The CRS WKT is supplemental.

BUT there is nothing saying you/we/pyresample, as CONSUMERS of the data, should not use the CRS WKT. If it is there, we can use it. The WKT will never be less accurate than the CF attributes. So if the CRS WKT and single-property attributes are both available, we should always use/prefer the WKT...as CONSUMERS.

@TomLav
Copy link
Contributor

TomLav commented Sep 28, 2023

For later reference, it seems that CF (as of CF-1.10) will not accept a crs variable defined only by its wkt description.

If the text of the convention is possibly ambiguous (see discussion above), the conformance document is clear (*):

The grid mapping variables must have the grid_mapping_name attribute. The legal values for the grid_mapping_name attribute are contained in Appendix F.

Which still leaves us all latitude as to what pyresample should do in such cases. As I stated above I believe it should try to find its way around and produce the area_def. But I wanted to clarify (at least for my own sake) what was CF-compliant or not(*).

I see that pyproj.CRS.to_cf() happily loads the CRS whether its definition uses only the WKT description (case B below), or both the CF-attribute and WKT descriptions (case A), or only the CF-attribute description (case C). The current behaviour of pyproj seems to be to load the CRS from crs_wkt if present (A and B) and decode the CF attributes only if there are no crs_wkt (case C). I think this is the behaviour you want for pyresample, @djhoese, so all good.

from copy import copy
from pyproj import CRS

a = area_def.crs.to_cf()
b = {'crs_wkt': a['crs_wkt']}
c = copy(a)
c.pop('crs_wkt')

crsA = CRS.from_cf(a)
crsB = CRS.from_cf(b)
crsC = CRS.from_cf(c)

print(crsA)
print(crsB)
print(crsC)

Thus, it should be easy for pyresample load_cf_area() to work smoothly in all 3 cases, we just need to decide how to warn/inform the user that we found a non-compliancy but marched on (in case B).

Turning this raise into a warning might be all we need to do.

(*) of course the conformance document could be wrong and not be aligned with CF convention. But what counts to me is that the conformance checkers will implement the conformance document.

@djhoese
Copy link
Member

djhoese commented Sep 29, 2023

As mentioned on slack from our discussion yesterday and today, after looking at the pyresample CF code it is clear pyresample is doing a lot more validation of CF standards than I thought it was. In those cases it makes sense to have a strict keyword argument (I'd prefer a default of False I think) that essentially "guesses" at whatever information it needs (which variable is the x coordinate, which one the y, etc). I still didn't agree with the idea that the WKT should not be used as the primary source of CRS information if it exists. Meaning the single-property attributes would only be depended on if the crs_wkt didn't exist. I don't think this is wrong in a CF-sense and I think it makes more sense as WKT will never be less accurate than the CF attributes. It is just an unfortunate situation that the CF standard has put us in. So in summary a strict keyword argument that when True checks if there is only a crs_wkt and raises an error would be OK with me. If False then pyresample just tries its best to get the CRS information. In general we will depend on pyproj to parse the CF grid mapping variable and produce whatever the best CRS is, but we can have additional checks for CF validation when strict=True.

I would be ok with changing that raise to a warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reconstruct CRS from WKT only
4 participants