Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Arrow? #13

Open
kbvernon opened this issue Jul 3, 2023 · 2 comments
Open

Apache Arrow? #13

kbvernon opened this issue Jul 3, 2023 · 2 comments

Comments

@kbvernon
Copy link

kbvernon commented Jul 3, 2023

Hi, didn't want to barge in on #12 , so thought I would raise this as a separate issue. Have you considered using an Arrow table to hold your graph object rather than a base R data.frame? In addition to the better memory management, there are other advantages to going the Arrow route, like storing metadata. If this is something you think is worthwhile, I would be happy to help implement at least the R side of it - afraid the C++ side of things is a little bit beyond me at the moment.

I'm coming at this from ecology, where the conversion from a grid to a graph can be prohibitively costly in terms of memory.

Also, a somewhat related question: why did you choose to represent the numeric vertex IDs as character strings? Is there some advantage to this in your C++ code?

This is a great package, btw. Thanks!

@vlarmet
Copy link
Owner

vlarmet commented Jul 4, 2023

Hi,

No, I have not considered using Arrow. To be honest, I don't know exactly what it is but I will look at it.
Internally, vertex IDs are integers from 0 to nbnode-1. Character type is more flexible for the user because anything can be converted into character.

@kbvernon
Copy link
Author

kbvernon commented Jul 4, 2023

Hi, thanks!

I'm not sure I fully understand the Arrow system either, but playing around with it, I can say it is VERY fast, like data.table fast, and it makes a table VERY small in memory, which is really useful when you're working with a graph that has N > 1 M vertices and 8 * N edges.

library(arrow)
library(cppRouting)
library(lobstr)

roads_dataframe <- read.csv(
  "https://raw.githubusercontent.com/vlarmet/cppRouting/master/data_readme/roads.csv",
  colClasses = c("character", "character", "numeric")
)

roads_arrow <- arrow::arrow_table(roads_dataframe)

lobstr::obj_size(roads_dataframe)
# 29.59 MB

lobstr::obj_size(roads_arrow)
# 284.64 kB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants