Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impl SaveToNpz for Optimizers #785

Open
jafioti opened this issue May 15, 2023 · 2 comments · May be fixed by #815
Open

Impl SaveToNpz for Optimizers #785

jafioti opened this issue May 15, 2023 · 2 comments · May be fixed by #815

Comments

@jafioti
Copy link
Contributor

jafioti commented May 15, 2023

On long training runs it's required to checkpoint models in case something goes wrong part way through. When only saving the model, and resetting the optimizer state, training takes much longer to resume. It would make sense therefore to save and load the optimizer as well. I believe this can be implemented with the same npz architecture as is currently implemented for TensorCollection modules.

@coreylowman
Copy link
Owner

Related to #101

@nkoppel
Copy link
Contributor

nkoppel commented Jul 13, 2023

I think I have an approach that will work well for this:

  1. Allow converting Gradients to and from Models to make serializing them easier.
  2. Create new optimizer objects that store their Gradients as Models.
  3. Serialize these objects instead of the optimizers themselves.
  4. Remove TensorCollection subtrait of serialization traits so that we can create a custom implementation for optimizers.

@nkoppel nkoppel linked a pull request Jul 13, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants