Freature Request: optionally parallelize `convert` #287

juansebastianl · 2021-09-25T07:13:18Z

Hello! The convert function is a very simple wrapper around the read and write operations of the individual file types. For filetypes with either chunked APIs or with a skip and max_rows parameter it would be possible to read in parts of the file in parallel and write separate output files, or store individual output files in memory and then combine them and write at once (writing in parallel is likely more tricky). There are a lot of cases where this would provide noticeable speedups at the cost of more cpu usage and memory usage (like csv to dta) but other times where it either doesn't make sense at all, or does not increase performance (for data that can't be chunked). Nevertheless, since the majority of the data used by R users follows the basic row-column specification, this would work for a lot of useful datatypes. I think this could be implemented with something as simple as an n_workers argument and future.apply in the background. I would love to hear thoughts on the suggestion!

The text was updated successfully, but these errors were encountered:

chainsawriot closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freature Request: optionally parallelize `convert` #287

Freature Request: optionally parallelize `convert` #287

juansebastianl commented Sep 25, 2021

Freature Request: optionally parallelize convert #287

Freature Request: optionally parallelize convert #287

Comments

juansebastianl commented Sep 25, 2021

Freature Request: optionally parallelize `convert` #287

Freature Request: optionally parallelize `convert` #287