[Bug] 分布式训练代码例子报错， #1540

apachemycat · 2024-05-07T07:07:56Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmengine).

Environment

pytorch 2.3 cuda 12.3 gpu train

Reproduces the problem - code sample

https://github.com/open-mmlab/mmengine/blob/main/examples/llama2/fsdp_finetune.py
修改为训练书生模型
# Prepare model for internlm2 by wuzhhui
model, tokenizer = build_model(
model_name_or_path=args.checkpoint,
return_tokenizer=True)

# Prepare model for llama
#tokenizer = LlamaTokenizer.from_pretrained(args.checkpoint)
#tokenizer.add_special_tokens({'pad_token': '<PAD>'})
#model = LlamaForCausalLM.from_pretrained(args.checkpoint)

Reproduces the problem - command or script

LOGLEVEL=DEBUG NPROC_PER_NODE=1 torchrun fsdp_finetune.py /models/instruct-finetrain.json /models/internlm2-1_8b --max-epoch 100 --save-interval 50 --output-dir ${work_dir}

Reproduces the problem - error message

RuntimeError: "amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
[rank0]: Traceback (most recent call last):
[rank0]: File "/models/internlm2-1_8b_fsdp_train/fsdp_finetune.py", line 185, in
[rank0]: train()
[rank0]: File "/models/internlm2-1_8b_fsdp_train/fsdp_finetune.py", line 161, in train
[rank0]: optimizer.update_params(loss)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 201, in update_params
[rank0]: self.step(**step_kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mmengine/optim/scheduler/param_scheduler.py", line 115, in wrapper
[rank0]: return wrapped(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mmengine/optim/optimizer/amp_optimizer_wrapper.py", line 137, in step
[rank0]: self.loss_scaler.unscale(self.optimizer)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/sharded_grad_scaler.py", line 278, in unscale_
[rank0]: optimizer_state["found_inf_per_device"] = self.unscale_grads(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/sharded_grad_scaler.py", line 243, in unscale_grads
[rank0]: torch.amp_foreach_non_finite_check_and_unscale(
[rank0]: RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'

Additional information

maybe model or lib

The text was updated successfully, but these errors were encountered:

zhouzaida · 2024-05-16T10:59:21Z

请问你的显卡是什么型号

zhouzaida · 2024-05-16T10:59:45Z

有可能是你的显卡不支持 Bfloat16 计算

zhouzaida · 2024-05-16T11:00:33Z

另外，如果你想要微调 InternLM 模型，推荐使用 XTuner (https://github.com/InternLM/xtuner)

apachemycat · 2024-05-19T01:32:56Z

GPU 1: Tesla V100-PCIE-32GB

zhouzaida · 2024-05-20T05:57:06Z

GPU 1: Tesla V100-PCIE-32GB

V100 应该是不支持 Bfloat16 的

apachemycat added the bug Something isn't working label May 7, 2024

apachemycat closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 分布式训练代码例子报错， #1540

[Bug] 分布式训练代码例子报错， #1540

apachemycat commented May 7, 2024

zhouzaida commented May 16, 2024

zhouzaida commented May 16, 2024

zhouzaida commented May 16, 2024

apachemycat commented May 19, 2024

zhouzaida commented May 20, 2024

[Bug] 分布式训练代码例子报错， #1540

[Bug] 分布式训练代码例子报错， #1540

Comments

apachemycat commented May 7, 2024

Prerequisite

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

zhouzaida commented May 16, 2024

zhouzaida commented May 16, 2024

zhouzaida commented May 16, 2024

apachemycat commented May 19, 2024

zhouzaida commented May 20, 2024