Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle special column classes more elegantly #69

Open
mdsumner opened this issue Jan 19, 2018 · 5 comments
Open

Handle special column classes more elegantly #69

mdsumner opened this issue Jan 19, 2018 · 5 comments

Comments

@mdsumner
Copy link
Contributor

mdsumner commented Jan 19, 2018

(just some notes for now, @ateucher please assign to me)

  • can we rejig to do the data frame traverse first, otherwise long-running geom-task fails
  • use as.Date/as.POSIXct with explicit format string because internal logic is quirky

as.Date(c("", "2001-01-01")) ## is an error but
as.Date(c("2001-01-01", "")) ## gives an NA in 2nd place

@ateucher
Copy link
Owner

Oh your second point is very interesting!

I'm not quite sure what you mean by doing data frame traversal first... right now it collects the information about the columns in col_classes(), and uses that information to put it back right with restore_classes() after it has gone through the mapshaper routine. It clearly needs to deal with Date/POSIXct columns better. And probably in restore_classes() find a way to catch conversion errors - make some sort of safe_as() function that traps errors and emits a warning when conversion fails rather than erroring out?

@mdsumner
Copy link
Contributor Author

The scenario in mind is when the geometric conversion takes some time, and presumably succeeds - but then we fail on some data column - so could we fail on the data column first, before the geom operation?

I'm also guessing that a loop for the "{ }" check would be better than a "[<-" replacement, but I haven't checked yet.

It's probably not important if we get the restoration right :) but it's caught me out many times.

@ateucher
Copy link
Owner

The scenario in mind is when the geometric conversion takes some time, and presumably succeeds - but then we fail on some data column - so could we fail on the data column first, before the geom operation?

It's very possible I'm missing something, but the data column parsing has to happen after the geometric conversion because the sequence of events is: sf/sp -> geojson -> into v8 context -> mapshaper magic -> out of v8 context -> geojson -> sf/sp. It's this last geojson -> sf/sp where we are restoring column classes and that's where the failure happens.

Unless.... what if we remove all of the attributes except a row id before passing the object into the v8 context, and rejoin the attributes to the processed spatial object when done? Then we could avoid all of that messiness of storing and restoring column classes, and it would have the added benefit of making the geojson object that gets sent into the v8 context smaller (if there were lots of attributes).

@mdsumner
Copy link
Contributor Author

Oh true, I forget it's not a one-to-one output, you're quite right! Your last proposal sounds good, and should work.

@ateucher
Copy link
Owner

ateucher commented Jan 25, 2018

I think it's going to require a bit more thought around dealing with objects that end up being aggregated/disaggregated (thinking especially ms_dissolve(), ms_explode(), ms_simplify(explode = TRUE), but there may be more...

ateucher added a commit that referenced this issue Sep 5, 2018
- Split attributes off from spatial and rejoin after mapshaper does its thing
@ateucher ateucher changed the title date/time round-trip Handle special column classes more elegantly Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants