Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freature Request: optionally parallelize convert #287

Closed
juansebastianl opened this issue Sep 25, 2021 · 0 comments
Closed

Freature Request: optionally parallelize convert #287

juansebastianl opened this issue Sep 25, 2021 · 0 comments

Comments

@juansebastianl
Copy link

Hello! The convert function is a very simple wrapper around the read and write operations of the individual file types. For filetypes with either chunked APIs or with a skip and max_rows parameter it would be possible to read in parts of the file in parallel and write separate output files, or store individual output files in memory and then combine them and write at once (writing in parallel is likely more tricky). There are a lot of cases where this would provide noticeable speedups at the cost of more cpu usage and memory usage (like csv to dta) but other times where it either doesn't make sense at all, or does not increase performance (for data that can't be chunked). Nevertheless, since the majority of the data used by R users follows the basic row-column specification, this would work for a lot of useful datatypes. I think this could be implemented with something as simple as an n_workers argument and future.apply in the background. I would love to hear thoughts on the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants