Implement `TaskLoader.save` when instantiated with xarray/pandas objects #84

tom-andersson · 2023-10-19T16:11:17Z

Summary

Currently, the TaskLoader can only be .saved when it has been __init__ed with filepaths in its context and target entries, not xarray/pandas data. However, this forces the user to have to save their normalised xarray/pandas data, when they might not actually care that much about where that data lives. For example:

data_processor = DataProcessor(...)
da_normalised = data_processor(da_raw)
data_processor.save("folder")
da_normalised.save("fpath.nc")  # We could potentially bypass this...
task_loader = TaskLoader(context="fpath.nc", target="fpath.nc")  # By instead initialising with raw xarray/pandas here...
task_loader.save("folder")  # And then `.save` would save the raw data objects alongside the TaskLoader config

We could instead initialise the TaskLoader in the typical way with raw xarray/pandas objects (which is more intuitive than fpaths), and then when saving the TaskLoader it will save those variables alongside the JSON config (with context/target file paths populated).

This FR should only be implemented after #82 is closed. We don't want to save the same data multiple times just because it appears multiple times in the context and/or target entries. So we'll want to leverage whatever internal TaskLoader` data structure is added to close #82.

Basic Example

If this feature were implemented, we'd be able to do:

data_processor = DataProcessor(...)
da_normalised = data_processor(da_raw)
data_processor.save("folder")
task_loader = TaskLoader(context=da_normalised, target=da_normalised)
task_loader.save("folder")  # This saves the context and target data as NetCDF/CSV in `"folder"`

See comment above - we will not want to save two NetCDF files in this case, because they are the same objects.

Drawbacks

The user might not realise that task_loader.save will save data to disk, which is especially risky with very large NetCDF data and when disk space is limited. We'll need to be clear in the documentation that this is what is happening under the hood.

Unresolved questions

No response

Implementation PR

No response

Reference Issues

No response

The text was updated successfully, but these errors were encountered:

tom-andersson added enhancement New feature or request good first issue Good for newcomers labels Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `TaskLoader.save` when instantiated with xarray/pandas objects #84

Implement `TaskLoader.save` when instantiated with xarray/pandas objects #84

tom-andersson commented Oct 19, 2023 •

edited

Implement TaskLoader.save when instantiated with xarray/pandas objects #84

Implement TaskLoader.save when instantiated with xarray/pandas objects #84

Comments

tom-andersson commented Oct 19, 2023 • edited

Summary

Basic Example

Drawbacks

Unresolved questions

Implementation PR

Reference Issues

Implement `TaskLoader.save` when instantiated with xarray/pandas objects #84

Implement `TaskLoader.save` when instantiated with xarray/pandas objects #84

tom-andersson commented Oct 19, 2023 •

edited