Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace rgdal for serializing to/from geojson #41

Closed
sckott opened this issue Apr 21, 2015 · 22 comments
Closed

Replace rgdal for serializing to/from geojson #41

sckott opened this issue Apr 21, 2015 · 22 comments
Assignees
Milestone

Comments

@sckott
Copy link
Collaborator

sckott commented Apr 21, 2015

Look and see if can just depend on a small javascript library for this - doing in pure R doable, but slow

@sckott sckott self-assigned this Apr 21, 2015
@sckott
Copy link
Collaborator Author

sckott commented Apr 23, 2015

working on the js branch - using a js library to see if we can do away with gdal

@sckott sckott added this to the v0.2 milestone Apr 28, 2015
@sckott sckott changed the title Do we really need rgdal and friends for serializing to geojson Replace rgdal for serializing to/from geojson Apr 28, 2015
@sckott sckott modified the milestones: v0.3, v0.2 Apr 7, 2016
@javrucebo
Copy link

With a good part of the performance bottleneck of geojson_json removed in #85 and the discussion about disk round-trip in #77 the question is indeed if further speed improvements can be made by not using writeOGR.

For well-defined objects serialization is not too hard, so I drafted a function to do it for SpatialPolygons(DataFrame), which is a fairly straightforward format. It uses Rcpp and for simplicity of testing I put it in an inline Rcpp::cppfunction. Some effort has been taken to produce output with is compatible with rgdal, so it can be uses as a verification of correct output. The checks I have seen to fail so far are due to rgdal handling encoding sometimes wrongly.

The function itself you can find at https://gist.github.com/javrucebo/fbbfe4b504d49893ad03d8fe0bd84f73

And I have written some code to benchmark/verify: https://gist.github.com/javrucebo/da86e7bf0fb4f5367e2b405c3bed1e54. The resuling geojson will differ from the one in rgdal in terms of decimals/precision, therefore I tested the numerics and the structure separately.

Benchmark from above code for some SpatialPolygons:

Not sure if that is a road you would want to go down ... for me it does the job and improves further the elapsed time, in particular when I need geojson strings to simplify maps with @ateucher's rmapshaper

bench_sptogeojson(raster::getData('GADM', country='VAT', level=0))
# Unit: seconds
#                        expr     min      lq        mean  median      uq     max neval
#                 gejson_json 0.03580 0.03580 0.035799645 0.03580 0.03580 0.03580     1
#  SpatialPolygons_to_geojson 0.00433 0.00433 0.004333479 0.00433 0.00433 0.00433     1
bench_sptogeojson(raster::getData('GADM', country='UKR', level=1))
# Unit: seconds
#                        expr   min    lq      mean median    uq   max neval
#                 gejson_json 1.020 1.020 1.0184167  1.020 1.020 1.020     1
#  SpatialPolygons_to_geojson 0.274 0.274 0.2737755  0.274 0.274 0.274     1
bench_sptogeojson(raster::getData('GADM', country='SWE', level=2))
# Unit: seconds
#                        expr  min   lq     mean median   uq  max neval
#                 gejson_json 3.77 3.77 3.765165   3.77 3.77 3.77     1
#  SpatialPolygons_to_geojson 1.25 1.25 1.251481   1.25 1.25 1.25     1
bench_sptogeojson(raster::getData('GADM', country='CAN', level=3))
# Unit: seconds
#                        expr  min   lq     mean median   uq  max neval
#                 gejson_json 58.1 58.1 58.07351   58.1 58.1 58.1     1
#  SpatialPolygons_to_geojson 15.6 15.6 15.64825   15.6 15.6 15.6     1

@sckott
Copy link
Collaborator Author

sckott commented Jun 14, 2016

Thanks very much for this @javrucebo

Have you tested the function with lots of different spatialpolygon inputs? Curious how well it works with diverse inputs of SpatialPolygon type

@ateucher
Copy link
Member

This is awesome, but I have the same concern. rgdal (and especially gdal on which it's based) are really well established and have taken into account most corner cases one can run into...

@sckott
Copy link
Collaborator Author

sckott commented Jun 14, 2016

@jeroenooms and I talked about possibility of simply converting the S4 spatial objects like SpatialPolygons to json via jsonlite and doing any manipulations on the object to make a proper geojson object - We havent tried it yet, but could be worth trying

@javrucebo
Copy link

Regarding first question of @sckott: Yes I did try it with many different inputs.

  • First of all a big number of GADM input, as you get there all sorts of combinations with holes, thousands of islands etc + encoding in many different languages.
  • Couple of downloads from naturalearth, some pre-packaged Polygons (e.g. from rworldmap and tmap)
    Result:
    Most differences between the output from rgdal and the "new" function are
  • due to numerical precision (already mentioned this), but actually I am not fully convinced of rgdal here, as for example you see in the data sections things like ISO_N3=4 converted to "ISO_N3": 4.000000
  • due to different use of Polygon vs. Multipolygon (still need to figure out when exactly rgdal is falling back to Polygon only.
  • due to encoding issues (sometimes rgdal is not writing out proper UTF-8 but e.g. "Non-ISO extended-ASCII" which is not valid JSON and sometimes the strings seem not right even in that encoding).
  • I have one case with rwrldxtra which I would still need to investigate if its worth to follow the approach in the first place.

For @ateucher' comment: I certainly agree, it's always safer to rely on heavily tested packages then to take some risk. But

  1. as pointed out also rgdal has some issues especially on the data front (changing numerical ID's, encoding) + it does not allow for non-numeric ID's (which SpatialPolygonsDataFrame does by default). [currently in my function I replicate rgdal behaviour ... but this could easily be changed.
  2. the SpatialPolygonsDataFrame format is a fairly simple format. I'd like to see some "edge" cases for this format ... as I do not see much more than having a list of lists of Polygons and taking some attention on holes ... but when building the object this is already taking care of [if you do not forcefully manually manipulate slots later on ... but that might not be liked by rgdal either].

And @sckott's second comment: Indeed, as I said: the format of SpatialPolygonsDataFrame is pretty simple; I do not know how jsonlite is working behind the scenes so difficult to jugde how fast it can get. But I check again also my original R-only replacement function which for medium-sized polgons was still faster than writeOGR (and I was not yet using jsonlite for the coordinates but a slow paste operation to replicate "exactly" the rgdal data format.

In a nutshell: I think writeOGR is just too slow (disregarding of the disk round-trip) and this will likely not be solved at the source, so a rewrite for sp-classes might be worth ... no matter which route to go down [and a pure R solution might be more attractive]

@jeroen
Copy link
Member

jeroen commented Jun 14, 2016

I think a package to convert between sp classes and geojson strings will be relatively straightforward to implement with jsonlite, and will likely be just as fast as going via rgdal.

@ateucher
Copy link
Member

Another thing to think about is sfr, where they are implementing spatial classes using simple features specification, and presumably thinking about conversions between old sp classes, the new spatial R classes, and other spatial formats.

@sckott
Copy link
Collaborator Author

sckott commented Jun 15, 2016

@ateucher AFAICT though (correct me if I'm wrong) sfr will still depend on GDAL - which is the source of pain we're trying to avoid - did that project get funding? can't tell if it did

I realize there's a tradeoff here between well tested libraries like GDAL and painless setup/installation for users. For me personally, I'd like to have a solution that is robust and covers nearly all cases, but is also portable across all platforms (and never gives travis hell, as GDAL does)

Thanks @javrucebo for your detailed notes. Seems like your first function does work pretty generally.

Seems like there are two routes we can go down:

  • C/C++: continue with the start above and finish off for all geojson classes
  • pure R w/ jsonlite: manipulate sp classes into geojson structure, then convert to json, or similar

Either route we take, probably makes sense to make a separate package for this task of converting sp classes to geojson (which I think would mean only depending on sp and possibly rgeos, which has additional spatial classes, the GEOS C library is I think far easier to deal with) - then we can use it in this package

any thoughts on how to proceed?

@ateucher
Copy link
Member

Yup, it was funded - see second last item here. I do think it will depend on GDAL. I just know that rgdal goes through some machinations for conversion, because the current sp classes don't follow the simple features spec - specifically with respect to polygons holes.

It probably does make sense to make a lightweight package for the particular task of converting between sp and geojson to avoid all the overhead of rgdal. As for the best approach, I'm not the best person to say - but @javrucebo has a really nice start. What would be the benefits of jsonlite?

@sckott
Copy link
Collaborator Author

sckott commented Jun 15, 2016

What would be the benefits of jsonlite?

I think simplicity is the main reason. And the work done in jsonlite is C based, so its fast. BUT, if we can't do with just jsonlite and pure R manipulations, then makes sense to drop down to C

@ateucher
Copy link
Member

Makes sense to me

@espinielli
Copy link

GDAL TravisCI pain is resolved by @edzer in the sf package, see its .travis.yml.

I got inspired by the above myself for a bookdown experiment where I need to read Shapefiles/Topojson/...and it works...of course the whole build time has gone from ~4min to ~20min.

Another option to reduce Travis pain/build time is to define a docker image and use that in Travis.
Maybe @yihui's r-docker repo can be of inspiration...

HTH

@yihui
Copy link

yihui commented Oct 3, 2016

@espinielli You should cache the _bookdown_files directory instead of _book, because knitr's cache is stored under _bookdown_files. https://bookdown.org/yihui/bookdown/github.html

@sckott
Copy link
Collaborator Author

sckott commented Oct 3, 2016

rgdal is still a pain, even if easier to install on Travis, I still want to avoid it if possible (assuming equally fast replacement code)

@edzer
Copy link

edzer commented Oct 3, 2016

The pain is IMO not with (r)gdal, but with travis: it doesn't let you add the ubuntugis-unstable repo, which has gdal 2.1 compiled.

@sckott
Copy link
Collaborator Author

sckott commented Oct 3, 2016

@edzer that would be nice if travis would allow that

@espinielli
Copy link

maybe Travis's Trusty can be of some help...

@yihui thanks...I think I got it from other bookdown examples/repos, I'll check and feedback...still learning (BTW super great work)

@edzer
Copy link

edzer commented Oct 5, 2016

Here is a trusty travis file that uses gdal-dev binaries; it is at version 1.10, which is enough for rgdal and rgeos to compile. For sfr 1.10 is currently not enough, but if travis waiting time is your only problem, this might solve it.

As explained here, the sp classes do not map 1:1 to the 7 simple features of geojson: sp does not distinguish between POLYGON and MULTIPOLYGON (and, as noted in this thread, doesn't register holes with the rings they fall in, rgeos needed for that), and also not between LINESTRING and MULTILINESTRING.

@sckott sckott modified the milestones: v0.4, v0.3 Nov 18, 2016
@sckott sckott modified the milestones: v0.4, v0.5 Aug 15, 2017
@sckott sckott modified the milestones: v0.5, v0.4 Aug 15, 2017
@sckott sckott modified the milestones: v0.5, v0.6 Oct 24, 2017
@ateucher
Copy link
Member

@sckott so we can push the more important fixes out the door, I'm going to suggest we move this to the v0.7 milestone

@sckott sckott modified the milestones: v0.6, v0.7 Mar 24, 2018
@sckott sckott modified the milestones: v0.7, 0.8 Apr 23, 2019
@sckott sckott removed this from the v0.8 milestone Oct 24, 2019
@sckott sckott added this to the v0.9 milestone Nov 28, 2019
@sckott
Copy link
Collaborator Author

sckott commented Nov 28, 2019

rgdal replaced with sf, work in 1ea0bd3

@sckott sckott closed this as completed Nov 28, 2019
@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants