add `DataLoader` related parameters to `fit()` and `predict()` #2295

BohdanBilonoh · 2024-03-27T17:16:31Z

Checklist before merging this PR:

Mentioned all issues that this PR fixes or addresses.
Summarized the updates of this PR under Summary.
Added an entry under Unreleased in the Changelog.

Summary

Add torch.utils.data.DataLoader related parameters to fit() and predict() of TorchForecastingModel

codecov · 2024-04-09T12:27:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.74%. Comparing base (a0cc279) to head (ec7a01b).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2295      +/-   ##
==========================================
- Coverage   93.75%   93.74%   -0.01%     
==========================================
  Files         138      138              
  Lines       14352    14346       -6     
==========================================
- Hits        13456    13449       -7     
- Misses        896      897       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

madtoinou · 2024-05-06T15:24:02Z

Hi @BohdanBilonoh,

It looks great, however to make it easier to maintain and more exhaustive, I think that it would be great to just add an argument called dataloader_kwargs, then check that the argument explicitly used by Darts are not redundant/overwritten and then pass this argument down to the DataLoader constructor.

It will allow users to specify more than just prefetch_factor, persistent_workers and pin_memory, while limiting copy-pasting from other library documentation (putting a link to the torch.DataLoader page does sound like a good idea for this argument however)

PS: Apologies for taking so long with the review of this PR.

joshua-xia · 2024-05-07T01:57:15Z

Hi @BohdanBilonoh, Would you please add multiprocessing_context parameter for Dataloader, it is useful when we use multi-workers for dataloader, Thanks!

joshua-xia · 2024-05-07T01:59:17Z

Hi @BohdanBilonoh, Would you please add multiprocessing_context parameter for Dataloader, it is useful when we use multi-workers for dataloader, Thanks!

@BohdanBilonoh refer to #2375

joshua-xia · 2024-05-07T02:05:29Z

Hi @BohdanBilonoh, Would you please add multiprocessing_context parameter for Dataloader, it is useful when we use multi-workers for dataloader, Thanks!

@BohdanBilonoh refer to #2375

@BohdanBilonoh My bad, it is good idea from @madtoinou to add dataloader_kwargs to let user input dataloader parameters as wish freely, not need to support special multiprocessing_context parameter forcibly

…predict()` of `TorchForecastingModel`

BohdanBilonoh · 2024-05-28T09:46:59Z

@madtoinou what do you think about hardcoded parameters like

batch_size=self.batch_size,
shuffle=True,
drop_last=False,
collate_fn=self._batch_collate_fn,

should it be hard coded for new dataloader_kwargs like

def _setup_for_train(
    self,
    train_dataset: TrainingDataset,
    val_dataset: Optional[TrainingDataset] = None,
    trainer: Optional[pl.Trainer] = None,
    verbose: Optional[bool] = None,
    epochs: int = 0,
    dataloader_kwargs: Optional[Dict[str, Any]] = None,
) -> Tuple[pl.Trainer, PLForecastingModule, DataLoader, Optional[DataLoader]]:

        ...

        if dataloader_kwargs is None:
            dataloader_kwargs = {}

        dataloader_kwargs["shuffle"] = True
        dataloader_kwargs["batch_size"] = self.batch_size
        dataloader_kwargs["drop_last"] = False
        dataloader_kwargs["collate_fn"] = self._batch_collate_fn

        # Setting drop_last to False makes the model see each sample at least once, and guarantee the presence of at
        # least one batch no matter the chosen batch size
        train_loader = DataLoader(
            train_dataset,
            **dataloader_kwargs,
        )

        dataloader_kwargs["shuffle"] = False

        # Prepare validation data
        val_loader = (
            None
            if val_dataset is None
            else DataLoader(
                val_dataset,
                **dataloader_kwargs,
            )
        )

        ...

or give a user full control on dataloader_kwargs?

tRosenflanz · 2024-05-30T17:47:52Z

@madtoinou what do you think about hardcoded parameters like

batch_size=self.batch_size,
shuffle=True,
drop_last=False,
collate_fn=self._batch_collate_fn,

should it be hard coded for new dataloader_kwargs like

def _setup_for_train(
    self,
    train_dataset: TrainingDataset,
    val_dataset: Optional[TrainingDataset] = None,
    trainer: Optional[pl.Trainer] = None,
    verbose: Optional[bool] = None,
    epochs: int = 0,
    dataloader_kwargs: Optional[Dict[str, Any]] = None,
) -> Tuple[pl.Trainer, PLForecastingModule, DataLoader, Optional[DataLoader]]:

        ...

        if dataloader_kwargs is None:
            dataloader_kwargs = {}

        dataloader_kwargs["shuffle"] = True
        dataloader_kwargs["batch_size"] = self.batch_size
        dataloader_kwargs["drop_last"] = False
        dataloader_kwargs["collate_fn"] = self._batch_collate_fn

        # Setting drop_last to False makes the model see each sample at least once, and guarantee the presence of at
        # least one batch no matter the chosen batch size
        train_loader = DataLoader(
            train_dataset,
            **dataloader_kwargs,
        )

        dataloader_kwargs["shuffle"] = False

        # Prepare validation data
        val_loader = (
            None
            if val_dataset is None
            else DataLoader(
                val_dataset,
                **dataloader_kwargs,
            )
        )

        ...

or give a user full control on dataloader_kwargs?

you could extend your suggestion to allow overrides but with populated defaults

defaults = dict(shuffle = True, batch_size = self.batch_size, drop_last = False, collate_fn = self._batch_collate_fn)
#combine with defaults but override them
dataloader_kwargs_train = {**defaults,**dataloader_kwargs}
#override shuffle
dataloader_kwargs_val = (**dataloader_kwargs_train, **dict(shuffle=False)}

- add predefined defaults

dennisbader

Thanks for the updates @BohdanBilonoh, I took the chance to fix some indentation issues in one of the docstrings and pushed the changes.

If we give the freedom to overwrite our default dataloader params when calling fit(), shouldn't we then also allow that during predict()?

Also, removing the num_loader_workers parameter is a breaking change. Can you document this in the CHANGELOG.md?

dennisbader · 2024-05-31T15:17:22Z

darts/models/forecasting/torch_forecasting_model.py

@@ -1487,14 +1484,17 @@ def predict_from_dataset(
            mc_dropout=mc_dropout,
        )

+        if data_loader_kwargs is None:


Shouldn't we allow then here as well the liberty to overwrite these defaults (expect shuffle)?

dennisbader · 2024-05-31T15:22:23Z

darts/models/forecasting/torch_forecasting_model.py

-            A larger number of workers can sometimes increase performance, but can also incur extra overheads
-            and increase memory usage, as more batches are loaded in parallel.
+        data_loader_kwargs
+            Optionally, a dictionary of keyword arguments to pass to the PyTorch DataLoader instances used to load the


it's referring to the train and val datasets, but it should be for the prediction dataset, same for the methods below

BohdanBilonoh requested review from dennisbader and madtoinou as code owners March 27, 2024 17:16

BohdanBilonoh force-pushed the refactor/dataloader-params branch from b1ec50c to 3503b41 Compare April 8, 2024 18:50

This was referenced May 2, 2024

[Question] Is it possible to override the models' data loading methods? #2365

Closed

GPU Optimization with Num_Workers not working #2354

Open

[BUG] darts should provide way to input multiprocessing_context parameter for dataloader when fit #2375

Open

Bohdan Bilonoh added 2 commits May 28, 2024 11:29

add torch.utils.data.DataLoader related parameters to fit() and `…

d46d4d4

…predict()` of `TorchForecastingModel`

update CHANGELOG.md

3a492df

BohdanBilonoh force-pushed the refactor/dataloader-params branch from 719b7ab to 3a492df Compare May 28, 2024 09:31

replace specific dataloader arguments with dataloader_kwargs

5e32e97

Bohdan Bilonoh and others added 2 commits May 31, 2024 13:02

- allow to set all params

42c6ff9

- add predefined defaults

fix wrong indentation

ec7a01b

dennisbader requested changes May 31, 2024

View reviewed changes

dennisbader reviewed May 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `DataLoader` related parameters to `fit()` and `predict()` #2295

add `DataLoader` related parameters to `fit()` and `predict()` #2295

BohdanBilonoh commented Mar 27, 2024

codecov bot commented Apr 9, 2024 •

edited

madtoinou commented May 6, 2024 •

edited

joshua-xia commented May 7, 2024

joshua-xia commented May 7, 2024

joshua-xia commented May 7, 2024 •

edited

BohdanBilonoh commented May 28, 2024 •

edited

tRosenflanz commented May 30, 2024

dennisbader left a comment

dennisbader May 31, 2024

dennisbader May 31, 2024

add DataLoader related parameters to fit() and predict() #2295

Are you sure you want to change the base?

add DataLoader related parameters to fit() and predict() #2295

Conversation

BohdanBilonoh commented Mar 27, 2024

Summary

codecov bot commented Apr 9, 2024 • edited

Codecov Report

madtoinou commented May 6, 2024 • edited

joshua-xia commented May 7, 2024

joshua-xia commented May 7, 2024

joshua-xia commented May 7, 2024 • edited

BohdanBilonoh commented May 28, 2024 • edited

tRosenflanz commented May 30, 2024

dennisbader left a comment

Choose a reason for hiding this comment

dennisbader May 31, 2024

Choose a reason for hiding this comment

dennisbader May 31, 2024

Choose a reason for hiding this comment

add `DataLoader` related parameters to `fit()` and `predict()` #2295

add `DataLoader` related parameters to `fit()` and `predict()` #2295

codecov bot commented Apr 9, 2024 •

edited

madtoinou commented May 6, 2024 •

edited

joshua-xia commented May 7, 2024 •

edited

BohdanBilonoh commented May 28, 2024 •

edited