Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better facilitate piping with sp inputs #42

Open
ateucher opened this issue Jul 8, 2016 · 12 comments
Open

Better facilitate piping with sp inputs #42

ateucher opened this issue Jul 8, 2016 · 12 comments

Comments

@ateucher
Copy link
Owner

ateucher commented Jul 8, 2016

For any of the methods for sp objects, the function converts it to json (character) before sending it into the V8 context along with commands for processing, and a json character object is returned from the V8 context. This is then converted back to an sp object before being returned to the user.

If the function is being used in a magrittr pipe chain with other rmapshaper functions, it should not perform the final conversion to sp, rather it should be able to return a json object to be passed to the next rmapshaper function in the chain so that the expensive sp -> json -> sp operation doesn't need to happen at each step. Finally, in the last step in the chain, the resulting object should be converted to sp to be returned to the user.

Current behaviour

SpatialPolygonsDataFrame_object %>%
ms_simplify() %>% # sp obj is converted to json, simplified, then converted to sp
ms_filter_islands() %>%  # sp obj converted to json, islands are filtered, converted to sp
ms_lines() # sp obj is converted to json, converted to lines, converted to sp and returned

Preferred behaviour

SpatialPolygonsDataFrame_object %>%
ms_simplify() %>% # sp obj is converted to json, simplified, returned as json
ms_filter_islands() %>%  # islands are filtered, returned as json
ms_lines() # gets converted to lines, then converted to sp and returned

This could be achieved by:

  1. The function detects that is being piped into another rmapshaper function and so doesn't convert the return value to sp, rather returns a json character string to be piped into the next function. The last function in the chain knows that it is last and thus converts and returns an sp object

  2. Have a return_class argument that has default class(input), but can be set to json or one of the Spatial classes depending where in the chain it is:

    SpatialPolygonsDataFrame_object %>%
    ms_simplify(return_class = 'json') %>% 
    ms_filter_islands() %>% 
    ms_lines(return_class = 'SpatialLinesDataFrame')
  3. Just advise users to convert to json before doing a pipe, as I currently do in the vignette and README:

    SpatialPolygonsDataFrame_object %>%
    geojsonio::geojson_json() %>%
    ms_simplify() %>% 
    ms_filter_islands() %>% 
    ms_lines() %>%
    geojsonio::geojson_sp()
@gaborcsardi
Copy link

@ateucher I guess this is subjective. Personally, I am still not convinced. One issue is debugging. Debugging pipes is not a good experience imo, and then if you write the equivalent pipe-less solution, the function will run different code, so the error might not come up. Just one example of how non-pure functions are dangerous.

@timelyportfolio
Copy link
Contributor

@ateucher, could you preserve your v8 context with each call, so that the transformed data is available for the next mapshaper call and could be reused if provided some argument?

@smbache
Copy link

smbache commented Jul 8, 2016

Surely, @gaborcsardi has a point, and there's sometimes a tradeoff between a "clean" DSL. That said it is surely possible, as it is almost the same as what jqr is doing. There the json query is built up by each verb and only the last one in the chain will trigger the actual jq interpreter. I guess you need to consider how you feel about the pros and cons and make up your mind.

@timelyportfolio
Copy link
Contributor

There is some discussion on daff about how to accomplish preserving V8. Not having to initiate v8 context each time also eliminates quite a bit of overhead.

@gaborcsardi
Copy link

Also, how do you know what the next function in the pipeline expects as input? That function may come from another package or wherever. Do you check if it is coming from your package? I think this'll get really tricky to implement and probably also fragile. But I might miss the easy workaround....

@smbache
Copy link

smbache commented Jul 8, 2016

Maybe as default return_class = auto() where auto will determine whether it is within a pipeline and whether it is the last such...

@gaborcsardi
Copy link

Yes, but even it is not last, it might need to return and sp object, if the next function in the chain does not know about this trick. Anyway, I guess you can work around this somehow....

@ateucher
Copy link
Owner Author

ateucher commented Jul 8, 2016

@gaborcsardi another good point about debugging. And yes, it would need to be able to detect whether the next function in the pipe was from rmapshaper. It does seem like a potential house of cards...

@timelyportfolio I think even more efficient than preserving the v8 context would be to pass the original unaltered json all the way through, collecting and assembling mapshaper commands as we go (perhaps in an attribute), then apply the whole thing at once in the last step - similar to what @smbache mentioned wrt jqr.

Thanks all for your thoughts! I will think on it some more and poke around in jqr, but am leaning towards @gaborcsardi's advice and try not to be to clever ;)

@ateucher
Copy link
Owner Author

ateucher commented Jul 8, 2016

Maybe as default return_class = auto() where auto will determine whether it is within a pipeline and whether it is the last such...

That seems like a pretty nice way to allow user override on a bit of a black-box process...

@gaborcsardi
Copy link

I think a better (but still tricky) way of solving these kind of problems, is to pass an object that can behave both as json and sp.

This might not be possible at this point, but in general, you can have an R6 object that has two active bindings: json and sp, but internally it might only store one of them. Then if a function requests json, and only sp is stored, it calculates json from sp on the fly (and potentially stores it). And vice-versa.

@smbache
Copy link

smbache commented Jul 9, 2016

+1 for R6 idea

@ateucher
Copy link
Owner Author

That is intriguing! I'll probably keep it simple until I see how sfr and geojson play out, but thanks for the ideas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants