You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Minimal Working Example for the error
The issue can be easily encountered by slightly modifying Tutorial 1. Follow the tutorial normally, until you reach the "Visualizing the Samples" section. Before that section, add the following code cell:
Expected behavior
This should have continued training the model for an additional 10 generations on CUDA, or 5 generations otherwise. However, because the Trainer's epochs parameter represents the total number of epochs to train, the function returns without any further training done. In order to achieve the expected behaviour, the user must instead create a new Trainer object and pass (current_epochs + desired_additional_epochs) as the value of the epochs parameter. This is unintuitive, and requires the user to manually keep track of how many epochs have been completed when they end a training session if they plan on continuing it later.
Desktop (please complete the following information):
OS: Windows 10 Pro, version 21H2
Installation
Pip
Additional context
Fixing this would involve rewriting significant portions of the BaseTrainer class. I would suggest allowing the user to pass in a number of epochs to train in the __call__() function, rather than having it set at object creation.
The text was updated successfully, but these errors were encountered:
Describe the bug
The process to resume training a previously trained model is unintuitive.
To Reproduce
Steps to reproduce the behavior:
The issue can be easily encountered by slightly modifying Tutorial 1. Follow the tutorial normally, until you reach the "Visualizing the Samples" section. Before that section, add the following code cell:
Expected behavior
This should have continued training the model for an additional 10 generations on CUDA, or 5 generations otherwise. However, because the Trainer's
epochs
parameter represents the total number of epochs to train, the function returns without any further training done. In order to achieve the expected behaviour, the user must instead create a newTrainer
object and pass (current_epochs + desired_additional_epochs) as the value of theepochs
parameter. This is unintuitive, and requires the user to manually keep track of how many epochs have been completed when they end a training session if they plan on continuing it later.Desktop (please complete the following information):
Installation
Additional context
Fixing this would involve rewriting significant portions of the
BaseTrainer
class. I would suggest allowing the user to pass in a number of epochs to train in the__call__()
function, rather than having it set at object creation.The text was updated successfully, but these errors were encountered: