Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resuming compiled model checkpoint #2003

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mjamroz
Copy link
Contributor

@mjamroz mjamroz commented Oct 23, 2023

Sometimes im getting Missing key(s) in state_dict: "stem.conv1.c.weight", [...] Unexpected key(s) in state_dict: "_orig_mod.stem.conv1.c.weight", [...] while trying to resume training from checkpoint of compiled (--torchcompile=inductor) model.
This PR solves the issue by stripping _orig_mod. prefix before loading weights.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@rwightman
Copy link
Collaborator

@mjamroz does this cover the case of a DDP wrapped torchcompile model? I don't know if i've actually checked that ... is it _orig_mod.module. or just _orig_mod?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants