You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[INFO|trainer.py:2048] 2024-05-18 00:07:10,006 >> ***** Running training *****
[INFO|trainer.py:2049] 2024-05-18 00:07:10,006 >> Num examples = 122
[INFO|trainer.py:2050] 2024-05-18 00:07:10,006 >> Num Epochs = 5
[INFO|trainer.py:2051] 2024-05-18 00:07:10,006 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2054] 2024-05-18 00:07:10,006 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2055] 2024-05-18 00:07:10,006 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2056] 2024-05-18 00:07:10,006 >> Total optimization steps = 75
[INFO|trainer.py:2057] 2024-05-18 00:07:10,007 >> Number of trainable parameters = 2,621,440
0%| | 0/75 [00:00<?, ?it/s]/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
Traceback (most recent call last):
File "/home/ybh/miniconda3/envs/nlpcc/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/data/ybh/nlpcc/LLaMA-Factory-main/src/llamafactory/cli.py", line 65, in main
run_exp()
File "/data/ybh/nlpcc/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/data/ybh/nlpcc/LLaMA-Factory-main/src/llamafactory/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/transformers/trainer.py", line 3147, in training_step
self.accelerator.backward(loss)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/accelerate/accelerator.py", line 2121, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/autograd/init.py", line 267, in backward
_engine_run_backward(
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
0%| | 0/75 [00:00<?, ?it/s
Others
i changed lora to finetune internlm-chat-7b, but this error is not happend.
The text was updated successfully, but these errors were encountered:
Reminder
Reproduction
CUDA_VISIBLE_DEVICES=1 llamafactory-cli example/......
below is the yaml file:
model
model_name_or_path: /home/ybh/ybh/models/internlm2-chat-20b
quantization_bit: 4
method
stage: sft
do_train: true
finetuning_type: lora
lora_target: wqkv
dataset
dataset: text_classification_coarse
template: intern2
cutoff_len: 6144
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: /home/ybh/ybh/nlpcc/LLaMA-Factory/saves/internlm2-chat-20b/qlora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 0.0001
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_steps: 0.1
fp16: true
eval
val_size: 0.1
per_device_eval_batch_size: 1
evaluation_strategy: steps
eval_steps: 10
Expected behavior
No response
System Info
[INFO|trainer.py:2048] 2024-05-18 00:07:10,006 >> ***** Running training *****
[INFO|trainer.py:2049] 2024-05-18 00:07:10,006 >> Num examples = 122
[INFO|trainer.py:2050] 2024-05-18 00:07:10,006 >> Num Epochs = 5
[INFO|trainer.py:2051] 2024-05-18 00:07:10,006 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2054] 2024-05-18 00:07:10,006 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2055] 2024-05-18 00:07:10,006 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2056] 2024-05-18 00:07:10,006 >> Total optimization steps = 75
[INFO|trainer.py:2057] 2024-05-18 00:07:10,007 >> Number of trainable parameters = 2,621,440
0%| | 0/75 [00:00<?, ?it/s]/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
Traceback (most recent call last):
File "/home/ybh/miniconda3/envs/nlpcc/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/data/ybh/nlpcc/LLaMA-Factory-main/src/llamafactory/cli.py", line 65, in main
run_exp()
File "/data/ybh/nlpcc/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/data/ybh/nlpcc/LLaMA-Factory-main/src/llamafactory/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/transformers/trainer.py", line 3147, in training_step
self.accelerator.backward(loss)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/accelerate/accelerator.py", line 2121, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/autograd/init.py", line 267, in backward
_engine_run_backward(
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
0%| | 0/75 [00:00<?, ?it/s
Others
i changed lora to finetune internlm-chat-7b, but this error is not happend.
The text was updated successfully, but these errors were encountered: