Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune on model other than GPT-2 #87

Open
raihan0824 opened this issue Sep 3, 2022 · 0 comments
Open

Finetune on model other than GPT-2 #87

raihan0824 opened this issue Sep 3, 2022 · 0 comments

Comments

@raihan0824
Copy link

raihan0824 commented Sep 3, 2022

hello, I would be grateful if someone answer this question clearly:
Can dialogpt finetuned on model other than GPT-2, if so, how?.
I tried to finetune this model to GPT-J, as I changed the LSP_train.py line 195 from
model = load_model(GPT2LMHeadModel(config), args.init_checkpoint, args, verbose=True)
to
model = load_model(GPTJForCausalLM.from_pretrained('EleutherAI/gpt-j-6B),args.init_checkpoint, args,verbose=True)
but get this error:
File "LSP_train.py", line 287, in <module> loss, ppl = model(input_ids, position_ids, token_ids, label_ids) File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/transformers/models/gptj/modeling_gptj.py", line 832, in forward return_dict=return_dict, File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/transformers/models/gptj/modeling_gptj.py", line 589, in forward past_length = past_key_values[0][0].size(-2) IndexError: dimension specified as -2 but tensor has no dimensions

The script above get an error when I'm using either GPU or CPU, but it's working fine on gpt-2 model.
Would appreciate any help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant