Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Loading Efficiency #1

Open
OmegaDroid opened this issue Aug 11, 2020 · 0 comments
Open

Data Loading Efficiency #1

OmegaDroid opened this issue Aug 11, 2020 · 0 comments

Comments

@OmegaDroid
Copy link
Contributor

Currently the runners require the data to be provided on a line by line basis.

This means we lose efficiency when loading data if the runner could load it more efficiently (such as csv in pandas, dask etc).

It would be nice if a runner could elect to get the data stream rather than taking it on a row by row basis, falling back to the row by row implementation if the stream isn't available.

Maybe we add csv_stream, sql_stream etc to the DataConnection which will raise NotImplemented by default. Alternatively a connector could provide a data type and stream object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant