Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add projection information to DataArrays read from GRD files #365

Open
santisoler opened this issue Nov 25, 2022 · 13 comments
Open

Add projection information to DataArrays read from GRD files #365

santisoler opened this issue Nov 25, 2022 · 13 comments
Labels
enhancement Idea or request for a new feature
Milestone

Comments

@santisoler
Copy link
Member

Description of the desired feature:

Right before merging #348 @mdtanker noticed that the load_oasis_grd() function doesn't preserve information about the grid projection. Even though the file header has a few bytes reserved for storing reference to map projection, this bytes seem to be always equal to zero. Nevertheless, the projection information is available in the .xml files that accompany the .grd files. @mdtanker wrote a snippet that can read those files and extract the information about the grid projection.

Would be nice to add the load_oasis_grd() function the capability to read the xml files and preserve the information about the grid projection. The two things that should be preserved are:

  • they projection code/type (EPSG code, UTM zone, etc)
  • the datum (e.g. WGS84)

Since the xml should have the same filename as the .grd file we can assume their name. But the function should be able to read the .grd file even if the xml file doesn't exist. Maybe raising a warning would be nice.

Are you willing to help implement and maintain this feature? Yes, but would love to see contributors to tackle this one, specially if they have access to Oasis Montaj license and can generate sample files to test this feature.

@santisoler santisoler added the enhancement Idea or request for a new feature label Nov 25, 2022
@santisoler santisoler added this to the v0.7.0 milestone Nov 25, 2022
@markjessell
Copy link

It may be worth pointing out that the *.xml is an optional part of the geosoft system, montaj reads the projection info from the *.gi file, not the *.xml file. This doesn't matter for import, but if anyone wants to make an exporter to *.grd format then this will be an issue. In any case, for import, the xml reader would be very handy when that file is available, but we shouldn't assume it will always be available. Evren Pakyuz Charrier of Oslandia is still talking to Seequent about getting the *.gi format.

@santisoler
Copy link
Member Author

Thanks @markjessell for your input. As you mentioned, the .gi format is not available and the .xml file is the easiest way to get information about the projection. I totally agree that the .xml file should be optional: the function should work as it currently does even if the .xml file doesn't exist.

Since we are not planning on making a writer to GRD files, not having the .gi format is not a deal breaker, but would be nice to have it so we could extract more information while reading the files. We don't plan on making a writer because we think there are way better file formats for multidimensional arrays out there: like netCDF which is an open format and it's being used by many other pieces of software (GMT, Xarray, rasterio, gdal, etc), allowing a smoother interaction between our software and the rest of the ecosystem. It was a major obstacle having GRD files and not being able to read them, but once we have them in Python we already have amazing tools to store them in files.

Evren Pakyuz Charrier of Oslandia is still talking to Seequent about getting the *.gi format.

I hope these conversations to be fruitful!

@santisoler santisoler modified the milestones: v0.7.0, v0.8.0 Oct 10, 2023
@mtb-za
Copy link

mtb-za commented Mar 9, 2024

Briefly spoke to @santisoler about getting involved again, and agreed that this issue was a sensible one to get out of the way.

Did Evren get somewhere with Seequent regarding reading .gi files as well, or are we sticking with just the xml (if it exists) for now? Do we have some sample data to use for testing?

@markjessell
Copy link

markjessell commented Mar 9, 2024 via email

@mtb-za
Copy link

mtb-za commented Mar 11, 2024

Do not need anything dramatic, just a representative sample of the xml files to make sure that we are reading in the right bits of it. If the projection information is consistent then I only need one or two to make sure that I am reading the correct fields, but if there are possible variations in the fields, then examples of those (or what the variation might be).

@santisoler
Copy link
Member Author

I remember that @LL-Geo included some .xml files along with the .grd files he added for testing purposes. You can find them here: https://github.com/fatiando/harmonica/tree/main/harmonica/tests/data

@mtb-za
Copy link

mtb-za commented Mar 12, 2024

Took a look at those, and none of them have projection data. They all have

<georeference>
  <dataextents>
	  <extent2d minx="1" miny="-24" maxx="50" maxy="24"/>
  </dataextents>
  <projection type="UNKNOWN">
	  <units name="m" unit_scale="1"/>
  </projection>
</georeference>

or similar. So we will need to get some example xml files that have this information, or re-export these to have it.

It looks like we will be checking (at least?) three things in order:

  1. Whether an xml file exists at all
  2. Whether the xml file contains a georeference field
  3. Whether the georeference field has anything useful in it. We might only care about projection specifically?

Does that seem like a sensible approach?

@santisoler
Copy link
Member Author

Oh, right. I think those files didn't have a proper projection, although having "UNKNOWN" projection is something. I think we should still use them for testing this new feature.

In order to find an xml file that contains actual projections, I think NRCAN is a good resource: https://geophysical-data.canada.ca/Portal/Results

All datasets in there are open access. And actually, the load_oasis_montaj_grid function was born as a need to read a few of those files.

@mtb-za
Copy link

mtb-za commented Mar 12, 2024

I grabbed a couple of GRD files from there and found the fields that we (probably) care about.

<geo:georeference>
	<geo:dataextents>
		<geo:extent2d minx="499100" miny="5402250" maxx="519625" maxy="5413700"/>
	</geo:dataextents>
	<geo:projection type="TRANSVERSE_MERCATOR" name="NAD83 / UTM zone 21N" wellknown_epsg="26921" projection="UTM zone 21N" ellipsoid="GRS 1980" datum="NAD83" datumtrf="NAD83 to WGS 84 (1)" datumtrf_epsg="1188" datumtrf_description="[NAD83] (4m) North America - Canada and USA (CONUS, Alaska mainland)" radius="6378137" eccentricity="0.08181919104">
		<geo:shift x="0" y="0" z="0"/>
		<geo:parameters scale="0.9996" false_east="500000" false_north="0" lat_origin="0" lon_origin="-57"/>
		<geo:units name="m" unit_scale="1"/>
	</geo:projection>
</geo:georeference>

It also looks like we should be able to save the projection as a WKT when saving a netCDF from xarray. This will need some conversion from the string(s) that look to be in the xml file, but this should be doable. Just need to get my head around the cfconventions, but at least proj should be able to generate a proper WKT. Not sure if we only need the epsg code for that, or if we will need more of this information.

Failing that, I can save the above as a sensible dictionary within the xarray and let it handle things as it wishes.

Do we want to do anything with the data extents, assuming that they exist?

@LL-Geo
Copy link
Member

LL-Geo commented Mar 13, 2024

Well done! 👍

We can start with the extract of the "wellknown_epsg=26921", similar like the def extract_proj_str(fname) https://github.com/markjessell/grd_loader/blob/main/geosoft_grid_parser.py.

If that exists, we can read that. If not, we can ask an input for the function. load_oasis_grd(, epsj='')

For the data extents, these can be read from the grd file. So we can ignore these from the XML file.

@markjessell
Copy link

markjessell commented Mar 13, 2024 via email

@mtb-za
Copy link

mtb-za commented Mar 14, 2024

Thanks for those examples Mark. Looks like there are a couple keys to keep an eye out for: projection, geo:projection and <gco:CharacterString>.

The latter was in one of your files (CluffGold_tmi_rte_n_tdr.grd.xml) and seems to contain a proper WKT string. For the other two versions, your files consistently had projection while the NRCan data seems to favour geo:projection, but the content of both fields are the same and we can search for the 'wellknown_epsg' bit of that string.

@markjessell
Copy link

markjessell commented Mar 20, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants