Data Loading Efficiency #1

OmegaDroid · 2020-08-11T11:35:06Z

Currently the runners require the data to be provided on a line by line basis.

This means we lose efficiency when loading data if the runner could load it more efficiently (such as csv in pandas, dask etc).

It would be nice if a runner could elect to get the data stream rather than taking it on a row by row basis, falling back to the row by row implementation if the stream isn't available.

Maybe we add csv_stream, sql_stream etc to the DataConnection which will raise NotImplemented by default. Alternatively a connector could provide a data type and stream object.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Loading Efficiency #1

Data Loading Efficiency #1

OmegaDroid commented Aug 11, 2020

Data Loading Efficiency #1

Data Loading Efficiency #1

Comments

OmegaDroid commented Aug 11, 2020