Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TaskLoader a generator(?) #24

Open
tom-andersson opened this issue Jul 8, 2023 · 0 comments
Open

Make TaskLoader a generator(?) #24

tom-andersson opened this issue Jul 8, 2023 · 0 comments
Labels
thoughts welcome Discussion and feedback is appreciated

Comments

@tom-andersson
Copy link
Collaborator

In conventional DL training interfaces, the data loader is typically a generator object, where iterating over it returns batches of data to pass to the model.

We could make the TaskLoader a generator to adhere to this convention. However, my main issue with this is that there is an enormous amount of flexibility in the TaskLoader.__call__ method. This reflects the flexibility of NPs as probabilistic models that can take any data as context and any data as target, resulting in a variety of ways you might want to sample your raw data to generate Tasks for training. This then begs the question of how next(task_loader) should sample the xarray/pandas dataset objects to produce the context and target data for the Tasks, if the user has not specified this explicitly. What date should be sliced and what sampling strategy should be used for the context/target data?

One option would be to set TaskLoader attributes like a list of train_dates that will be looped over for generating Tasks, plus additional information on the context_sampling and target_sampling strategies. Or, context_sampling and target_sampling and additional TaskLoader.__call__ kwargs could be passed at generation time.

IMO it is safer to have the user explicitly passing and controlling these sampling options by directly calling the TaskLoader.__call__ method to generate batches of Task objects for training. However, if there is a clear benefit for being able to loop over a TaskLoader and a clean way to implement it, then this is worth considering. I'm open to discussion on this.

cc @jonas-scholz123

@tom-andersson tom-andersson added the help wanted Extra attention is needed label Jul 8, 2023
@tom-andersson tom-andersson added thoughts welcome Discussion and feedback is appreciated and removed help wanted Extra attention is needed labels Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
thoughts welcome Discussion and feedback is appreciated
Projects
None yet
Development

No branches or pull requests

1 participant