Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent class attributes across countries #26

Open
noahdasanaike opened this issue Dec 5, 2023 · 10 comments
Open

Inconsistent class attributes across countries #26

noahdasanaike opened this issue Dec 5, 2023 · 10 comments
Assignees

Comments

@noahdasanaike
Copy link

When querying certain countries, as so:

countries <- map(c("Canada", "Mexico", "Brazil", "Argentina", "Uruguay", "Paraguay"), ~rgeoboundaries::gb_adm0(.)),

the resulting geometry columns have inconsistent class attributes (multi polygon or polygon) and therefore cannot be easily grouped together; i.e., the following call fails:

data.table::rbindlist(countries).

This may be an intended feature rather than an issue per say but there should be no harm in casting all polygons to a minimum common geometry type (like multi polygon) before returning.

@DanRunfola
Copy link
Member

DanRunfola commented Dec 6, 2023

This is surprising - in our geoBoundaries builds, we actually cast everything to multi:

self.geomDta["geometry"] = [MultiPolygon([feature]) if type(feature) == Polygon else feature for feature in self.geomDta["geometry"]]

See - https://github.com/wmgeolab/geoBoundaryBot/blob/86f05d9bd2a9df4c075b6c3c247b0ca548ef5e5c/geoBoundaryBuilder/builderClass.py#L548C17-L549C1

This is applied to every single data layer, so I'm not sure how we could have Polygons (not MultiPolygon's) slipping through.

As you point out, in our geoJSON PRY is cast to Polygon:
https://github.com/wmgeolab/geoBoundaries/raw/c0ed7b8/releaseData/gbOpen/PRY/ADM0/geoBoundaries-PRY-ADM0.geojson

We're in the middle of refactoring our build code, so I'll put this high on our list of bugs to fix. This won't be an immediate turnaround (as it impacts the actual core builds, not just the R package), and my suspicion is this is actually an upstream issue with Mapshaper, which is what we use to build the final files. If that is the case, the solution is going to be - in the build around line 875 (https://github.com/wmgeolab/geoBoundaryBot/blob/86f05d9bd2a9df4c075b6c3c247b0ca548ef5e5c/geoBoundaryBuilder/builderClass.py#L875), we'll need to add a routine that re-loads each file, recasts to multiPolygon (again?), and then re-saves. At least until we can figure out why mapshaper isn't pushing to Multi in some cases (which we will need to confirm is the case, by looking at the tmpJson that we output from geoPandas).

Also of note - we actually already reload the files after mapshaper builds them to fix a projection issue with the JSONs (lines 872-873, and then again on 887-888). Adding a multiPolygon recast there wouldn't be hard to implement at all, if needed in the near-term to fix this. I'll look into doing that for a short-term bandaid.

@rohith4444 see the above notes for our package development and k8s refactor.

@DanRunfola
Copy link
Member

I have made a small edit to the build, and am going to try and run one this evening to see if it at least gives a short-term resolution:
https://github.com/wmgeolab/geoBoundaryBot/blob/d8f844252c754e194645a1409b37b3cb507ae974/geoBoundaryBuilder/builderClass.py#L889
https://github.com/wmgeolab/geoBoundaryBot/blob/d8f844252c754e194645a1409b37b3cb507ae974/geoBoundaryBuilder/builderClass.py#L873

        tmpGeomJSONproj_simplified_multi = tmpGeomJSONproj_simplified.cast("MultiPolygon")

This is then written out with the correct EPSG codes in the next step.

This is a pretty hacky / short-term solution, but may give us an immediate resolution here, and then we can work on a better/longer term fix during our refactor.

I'll try to remember to loop back to this tomorrow morning after a build is run with the new code.

@DanRunfola
Copy link
Member

DanRunfola commented Dec 7, 2023

Update: this wasn't quite as simple as I had thought, so trying some new solutions this AM. Current test - after mapshaper json warp - is reloading and running:

        def to_multipolygon(geom):
            if isinstance(geom, Polygon):
                return MultiPolygon([geom])
            return geom
tmpGeomJSONproj_multi = tmpGeomJSONproj.geometry.apply(to_multipolygon)

If this goes, I'll do a full build this AM.

@DanRunfola
Copy link
Member

Ok - that seems to do it, so moving forward with this bandaid fix.
Of note, this fixes geoJSONs. I need to take a closer look at the R package to confirm that's the primary file type it uses (not shape).

@noahdasanaike
Copy link
Author

Thanks so much, @DanRunfola!

@DanRunfola
Copy link
Member

Just FYI, this fix is now in for our geoJSONs, but I need to do a little more testing before I can apply it to the shapefiles, which the R package relies on. Hoping to have time to do that in the coming day or two (this afternoon with some real luck).

@DanRunfola
Copy link
Member

To followup here - this is actually trickier than originally thought, as the "simple" fix to change all types results in corrupted attribute tables. I have isolated the issue to our use of mapshaper for our simplification, and we will probably need to look into deprecating that in favor of some of our own algorithms to really "fix" this, which may be a ways off.

Open to any suggestions on how we might make this easier, but I don't have any "quick" fixes in mind right this moment. Mapshaper is just too important for the process to do a drop-in replacement (it's wicked fast!).

@noahdasanaike
Copy link
Author

I would imagine that simply calling res <- sf::st_cast(res, "MULTIPOLYGON") on line 10 after res <- st_read(path, quiet = quiet) in geoboundaries.R would work.

@DanRunfola
Copy link
Member

We'll give that a shot - just pinged @rohith4444 to prioritize this one.

@rohith4444
Copy link
Collaborator

Hi @noahdasanaike,

Thank you for bringing this to us. I've implemented the suggested modification to cast all polygons to a minimum common geometry type (like multipolygon) before returning.

I've tested the functionality with the query you provided countries <- map(c("Canada", "Mexico", "Brazil", "Argentina", "Uruguay", "Paraguay"), ~rgeoboundaries::gb_adm0(.)) , and I didn't encounter any errors. It seems like the issue has been resolved. Please feel free to try it out on your end and let me know if you encounter any further issues.

Best regards,
Rohith

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants