You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Saves multiple .pth on each checkpoint. Instead of overwriting every checkpoint.pth
Motivation
Often useful to see how model performs at each epoch/savepoint. For example when training llm, want to measure the generative capabilities after each epoch and see if it is improving
The text was updated successfully, but these errors were encountered:
Example: after epoch 1 it saves checkpoint_ep01.pth
after epoch 2 it saves checkpoint_ep02.pth
when loading mode back in according to config, it by default will load in sorted(glob(“checkpoint_ep*”))[-1] aka the last epoch to keep the behavior the same as it currently is
alternatively if save_best_only=true, then keep the current behavior of saving as checkpoint.pth ?
We didnt do that by default as model weights take a ton of disk space.
We could theoretically make it a separate setting to additionally save all checkpoints, wdyt?
Most research papers are only training for 1 epoch, sometimes 2. If the user knows what theyre doing and wants to enable it, I think its a nice option. Especially since its a simple implementation.
🚀 Feature
Saves multiple .pth on each checkpoint. Instead of overwriting every checkpoint.pth
Motivation
Often useful to see how model performs at each epoch/savepoint. For example when training llm, want to measure the generative capabilities after each epoch and see if it is improving
The text was updated successfully, but these errors were encountered: