Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model is not initialized correctly when path to a pretrained model is provided via pre_trained #146

Open
ThuongTNguyen opened this issue Dec 9, 2023 · 0 comments

Comments

@ThuongTNguyen
Copy link

Description

I use a script similar to cola.sh to train and/or evaluate a model for sequence classification.
There are two possible parameters for model state files init_model and pre_trained.
I want and expect the model to be loaded with weights from pre_trained when provided while vocabulary is loaded based on init_model if init_model is one of the provided pretrained models.
However, the model parameters are actually loaded using init_model only. That's because pre_trained flag doesn't have an effect in this fucntion, although I expect pre_trained should override init_model.

Steps to reproduce

  • Set init_model to deberta-v3-base
  • Set pre_trained to $PATH_TO_MY_MODEL, which is a path to the pretrained mDeBERTa-V3-Base for example
  • Check the model parameter after loading, e.g print(model.deberta.encoder.layer[7].output.dense.weight[:5,:4]) after this line
    • Expected result (mDeBERTa-v3-base):
      tensor([[-0.0212, 0.0130, 0.0446, 0.0156],
      [ 0.0811, 0.0023, 0.0057, -0.0301],
      [-0.0190, 0.0097, -0.0114, 0.0306],
      [ 0.0049, -0.0174, 0.0064, -0.0275],
      [-0.0152, -0.0411, -0.0166, -0.0447]], dtype=torch.float16)
    • Actual result (DeBERTa-v3-base):
      tensor([[ 0.0278, -0.0206, -0.0062, 0.0368],
      [ 0.0262, -0.0676, 0.0477, 0.0249],
      [-0.0364, 0.0453, 0.0912, 0.0590],
      [-0.0638, 0.0402, 0.0272, -0.0013],
      [-0.0352, -0.0579, 0.0320, 0.0003]], grad_fn=)

Additional information/Environment

My system setup is:

  • PyTorch 1.10.0+cu113
  • GPU: NVIDIA GeForce GTX 1080 Ti
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant