Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sf::st_as_text() is slow #2361

Open
kadyb opened this issue Mar 17, 2024 · 7 comments
Open

sf::st_as_text() is slow #2361

kadyb opened this issue Mar 17, 2024 · 7 comments

Comments

@kadyb
Copy link
Contributor

kadyb commented Mar 17, 2024

It seems that sf::st_as_text() is definitely slower than its alternatives available in {lwgeom} and {terra}.

library("sf"); library("lwgeom"); library("terra")

pts = matrix(1:100000, , 2)
p = st_multipoint(pts)
p = st_cast(st_sfc(p), "POINT")
p2 = vect(p)

bench::mark(
  check = FALSE,
  sf::st_as_text(p),
  lwgeom::st_astext(p),
  terra::geom(p2, wkt = TRUE)
)
#   expression                       min   median `itr/sec`
# 1 sf::st_as_text(p)              3.79s    3.79s     0.264
# 2 lwgeom::st_astext(p)        111.51ms 112.72ms     8.64 
# 3 terra::geom(p2, wkt = TRUE)  55.96ms  56.45ms    17.7  

## second example -----------------------------------------
nc = read_sf(system.file("shape/nc.shp", package = "sf"))
nc = st_as_sfc(nc)
nc2 = vect(nc)

bench::mark(
  check = FALSE,
  sf::st_as_text(nc),
  lwgeom::st_astext(nc),
  terra::geom(nc2, wkt = TRUE)
)
#   expression                        min   median `itr/sec`
# 1 sf::st_as_text(nc)           146.22ms 155.15ms      6.30
# 2 lwgeom::st_astext(nc)          1.81ms   1.86ms    528.  
# 3 terra::geom(nc2, wkt = TRUE)   2.57ms   2.62ms    380.  
@edzer
Copy link
Member

edzer commented Mar 27, 2024

Yes, maybe we could add pointers to alternatives (lwgeom) in the docs; st_as_text from sf uses native R code by default. The assumption is that nobody will use it for large jobs, but even for printing ansf object, where it is called on the first 10 features, passing a very complex polygon will take time.

Use

Sys.setenv("LWGEOM_WKT"="true")

to have sf use that path (also not documented). Making this the default would make lwgeom effectively a hard dependency. An alternative would be to use lwgeom by default only if it is present.

See also #1602 #957 #800 and #747

@kadyb
Copy link
Contributor Author

kadyb commented Mar 27, 2024

I also think it would be good idea to mention {lwgeom} in the doc.

sf/R/wkt.R

Line 20 in 8bf890e

fmt = function(x, ...) sub("^[ ]+", "", sapply(unclass(x), format, ...))

Maybe if we add fixed = TRUE argument in sub() it will be a little faster? And maybe rewriting *apply() into loop with preallocation would also bring some benefits?

@edzer
Copy link
Member

edzer commented Mar 28, 2024

fixed = TRUE is not what is meant, as `"^[ ]+" is a regular expression (one or more white spaces).

The change to default to using lwgeom is a lot of work, as the WKT produced changes and many tests in sf itself and probably in many reverse dependencies need to be adjusted.

@kadyb
Copy link
Contributor Author

kadyb commented Mar 28, 2024

Yeah, adding {lwgeom} as a hard dependency is too demanding change, so mention this package in the docs will be ok if someone is looking for a faster alternative. I'm still wondering if using a loop with preallocation would provide any benefits, as in the example below:

x = seq_len(1e6)
fun = function(x) {
  output = double(length(x))
  for (i in seq_along(x)) output[i] = sqrt(x[i])
  return(output)
}

bench::mark(
  sapply(x, sqrt),
  vapply(x, sqrt, double(1)),
  fun(x)
)
#   expression       min  median `itr/sec` mem_alloc
# 1 sapply(x, s… 418.4ms   489ms      2.04   30.89MB
# 2 vapply(x, s… 292.5ms 300.6ms      3.33    7.63MB
# 3 fun(x)        46.7ms  47.2ms     20.7     7.63MB

@edzer
Copy link
Member

edzer commented Mar 28, 2024

Feel free to try out, I would be surprised if that would reduce time meaningfully.

@kadyb
Copy link
Contributor Author

kadyb commented Mar 29, 2024

Should I prepare PR with changes in documentation or would you rather do it yourself?

@JosiahParry
Copy link
Contributor

"The assumption is that nobody will use it for large jobs, ..."

FWIW, in a previous job we took sf objects and casted the geometry column to WKT to upload into snowflake. We ran into this problem and the solution was to use wk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants