Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shape mismatch when loading llava-phi path #681

Open
shockjiang opened this issue May 13, 2024 · 2 comments
Open

shape mismatch when loading llava-phi path #681

shockjiang opened this issue May 13, 2024 · 2 comments

Comments

@shockjiang
Copy link

I try to load pretrained pth of llava: hub/llava-phi-3-mini-pth/model.pth. And I got this strange error:

  • used deepspeed zerospeed3 and flash-attn.
RuntimeError: Error(s) in loading state_dict for LLaVAModel:
	size mismatch for llm.model.embed_tokens.weight: copying a param with shape torch.Size([32064, 3072]) from checkpoint, the shape in current model is torch.Size([0]).
	size mismatch for llm.model.layers.0.self_attn.o_proj.weight: copying a param with shape torch.Size([3072, 3072]) from checkpoint, the shape in current model is torch.Size([0]).

any clue?
thx!

@pppppM
Copy link
Collaborator

pppppM commented May 14, 2024

The issue might be due to the local model not being initialized correctly.

Before loading the checkpoint, check if the model contains the key llm.model.layers.0.self_attn.o_proj.weight.

@hhaAndroid
Copy link
Collaborator

hhaAndroid commented May 16, 2024

@shockjiang Can you try this in https://github.com/InternLM/xtuner/blob/main/xtuner/configs/deepspeed/deepspeed_zero3.json ?

{"zero_optimization": {
  "stage3_prefetch_bucket_size":0}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants