Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ascend 910b,chatglm2做全量微调报错 #3788

Closed
1 task done
belle9217 opened this issue May 17, 2024 · 6 comments
Closed
1 task done

ascend 910b,chatglm2做全量微调报错 #3788

belle9217 opened this issue May 17, 2024 · 6 comments
Labels
good first issue Good for newcomers solved This problem has been already solved.

Comments

@belle9217
Copy link

Reminder

  • I have read the README and searched the existing issues.

Reproduction

bug 如下图

Expected behavior

No response

System Info

No response

Others

image

@hiyouga hiyouga added the pending This problem is yet to be addressed. label May 17, 2024
@hunterhome
Copy link

[INFO|modeling_utils.py:4170] 2024-05-20 17:25:15,119 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:4178] 2024-05-20 17:25:15,119 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /root/.cache/modelscope/hub/ZhipuAI/chatglm3-6b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3719] 2024-05-20 17:25:15,124 >> Generation config file not found, using a generation config created from the model config.
05/20/2024 17:25:15 - INFO - llamafactory.model.utils.checkpointing - Gradient checkpointing enabled.
05/20/2024 17:25:15 - INFO - llamafactory.model.utils.attention - Using vanilla attention implementation.
05/20/2024 17:25:15 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
05/20/2024 17:25:15 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
05/20/2024 17:25:15 - INFO - llamafactory.model.loader - trainable params: 1949696 || all params: 6245533696 || trainable%: 0.0312
[INFO|trainer.py:626] 2024-05-20 17:25:15,521 >> Using auto half precision backend
[INFO|trainer.py:2048] 2024-05-20 17:25:15,881 >> ***** Running training *****
[INFO|trainer.py:2049] 2024-05-20 17:25:15,881 >> Num examples = 1,000
[INFO|trainer.py:2050] 2024-05-20 17:25:15,881 >> Num Epochs = 3
[INFO|trainer.py:2051] 2024-05-20 17:25:15,882 >> Instantaneous batch size per device = 2
[INFO|trainer.py:2054] 2024-05-20 17:25:15,882 >> Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:2055] 2024-05-20 17:25:15,882 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2056] 2024-05-20 17:25:15,882 >> Total optimization steps = 186
[INFO|trainer.py:2057] 2024-05-20 17:25:15,884 >> Number of trainable parameters = 1,949,696
0%| | 0/186 [00:00<?, ?it/s]Traceback (most recent call last):
File "/data/anaconda3/envs/llama_factory/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/data/LLaMA-Factory/src/llamafactory/cli.py", line 65, in main
run_exp()
File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 34, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/data/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward
return self.base_model(
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 941, in forward
transformer_outputs = self.transformer(
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 834, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 631, in forward
layer_ret = torch.utils.checkpoint.checkpoint(
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
ret = function(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 408, in forward
query_layer = apply_rotary_pos_emb(query_layer, rotary_pos_emb)
NotImplementedError: Unknown device for graph fuser

@statelesshz
Copy link
Contributor

@hunterhome chatglm使用了torch.jit,torch-npu不支持,可以把对应的torch.jit装饰器注释掉

@statelesshz
Copy link
Contributor

image

@statelesshz
Copy link
Contributor

statelesshz commented May 27, 2024

cc @belle9217

@hiyouga hiyouga added solved This problem has been already solved. and removed pending This problem is yet to be addressed. labels May 27, 2024
@MengqingCao
Copy link
Contributor

补充一点信息,不要在报错的 ~/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py 取消 torch.jit 装饰器注释;要在 modelscope 或 huggingface下载的 repo 里修改对应文件,比如modelscope的是 ~/.cache/modelscope/hub/ZhipuAI/chatglm3-6b/modeling_chatglm.py

@hiyouga hiyouga added the good first issue Good for newcomers label May 28, 2024
@hiyouga hiyouga closed this as completed May 28, 2024
@hunterhome
Copy link

hunterhome commented May 31, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

5 participants