You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RuntimeError: "amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
[rank0]: Traceback (most recent call last):
[rank0]: File "/models/internlm2-1_8b_fsdp_train/fsdp_finetune.py", line 185, in
[rank0]: train()
[rank0]: File "/models/internlm2-1_8b_fsdp_train/fsdp_finetune.py", line 161, in train
[rank0]: optimizer.update_params(loss)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 201, in update_params
[rank0]: self.step(**step_kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mmengine/optim/scheduler/param_scheduler.py", line 115, in wrapper
[rank0]: return wrapped(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mmengine/optim/optimizer/amp_optimizer_wrapper.py", line 137, in step
[rank0]: self.loss_scaler.unscale(self.optimizer)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/sharded_grad_scaler.py", line 278, in unscale_
[rank0]: optimizer_state["found_inf_per_device"] = self.unscale_grads(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/sharded_grad_scaler.py", line 243, in unscale_grads
[rank0]: torch.amp_foreach_non_finite_check_and_unscale(
[rank0]: RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
Additional information
maybe model or lib
The text was updated successfully, but these errors were encountered:
Prerequisite
Environment
pytorch 2.3 cuda 12.3 gpu train
Reproduces the problem - code sample
https://github.com/open-mmlab/mmengine/blob/main/examples/llama2/fsdp_finetune.py
修改为训练 书生模型
# Prepare model for internlm2 by wuzhhui
model, tokenizer = build_model(
model_name_or_path=args.checkpoint,
return_tokenizer=True)
Reproduces the problem - command or script
LOGLEVEL=DEBUG NPROC_PER_NODE=1 torchrun fsdp_finetune.py /models/instruct-finetrain.json /models/internlm2-1_8b --max-epoch 100 --save-interval 50 --output-dir ${work_dir}
Reproduces the problem - error message
RuntimeError: "amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
[rank0]: Traceback (most recent call last):
[rank0]: File "/models/internlm2-1_8b_fsdp_train/fsdp_finetune.py", line 185, in
[rank0]: train()
[rank0]: File "/models/internlm2-1_8b_fsdp_train/fsdp_finetune.py", line 161, in train
[rank0]: optimizer.update_params(loss)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 201, in update_params
[rank0]: self.step(**step_kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mmengine/optim/scheduler/param_scheduler.py", line 115, in wrapper
[rank0]: return wrapped(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mmengine/optim/optimizer/amp_optimizer_wrapper.py", line 137, in step
[rank0]: self.loss_scaler.unscale(self.optimizer)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/sharded_grad_scaler.py", line 278, in unscale_
[rank0]: optimizer_state["found_inf_per_device"] = self.unscale_grads(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/sharded_grad_scaler.py", line 243, in unscale_grads
[rank0]: torch.amp_foreach_non_finite_check_and_unscale(
[rank0]: RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
Additional information
maybe model or lib
The text was updated successfully, but these errors were encountered: