shape mismatch when loading llava-phi path #681

shockjiang · 2024-05-13T01:58:41Z

I try to load pretrained pth of llava: hub/llava-phi-3-mini-pth/model.pth. And I got this strange error:

used deepspeed zerospeed3 and flash-attn.

RuntimeError: Error(s) in loading state_dict for LLaVAModel:
	size mismatch for llm.model.embed_tokens.weight: copying a param with shape torch.Size([32064, 3072]) from checkpoint, the shape in current model is torch.Size([0]).
	size mismatch for llm.model.layers.0.self_attn.o_proj.weight: copying a param with shape torch.Size([3072, 3072]) from checkpoint, the shape in current model is torch.Size([0]).

any clue?
thx!

The text was updated successfully, but these errors were encountered:

pppppM · 2024-05-14T06:29:05Z

The issue might be due to the local model not being initialized correctly.

Before loading the checkpoint, check if the model contains the key llm.model.layers.0.self_attn.o_proj.weight.

hhaAndroid · 2024-05-16T03:35:24Z

@shockjiang Can you try this in https://github.com/InternLM/xtuner/blob/main/xtuner/configs/deepspeed/deepspeed_zero3.json ?

{"zero_optimization": {
  "stage3_prefetch_bucket_size":0}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shape mismatch when loading llava-phi path #681

shape mismatch when loading llava-phi path #681

shockjiang commented May 13, 2024

pppppM commented May 14, 2024

hhaAndroid commented May 16, 2024 •

edited

shape mismatch when loading llava-phi path #681

shape mismatch when loading llava-phi path #681

Comments

shockjiang commented May 13, 2024

pppppM commented May 14, 2024

hhaAndroid commented May 16, 2024 • edited

hhaAndroid commented May 16, 2024 •

edited