DPA-2 does not support multi-card invocation. #3691

wangyi01 · 2024-04-19T09:02:35Z

Summary

When fine-tuning the first step model of DPA-2, I keep encountering out-of-memory errors. Even reducing the batch size and switching to GPUs with larger memory doesn't seem to work well.

Details

The memory of one card is not sufficient, so multiple cards were used for operation.I was trying to utilize multiple cards when using DPA-2, but in practice, only one card was being invoked, leading to insufficient memory. I have four GPUs with 16GB each. However, it indicates that the memory of the first GPU is insufficient, suggesting that only the first GPU is being utilized. So why doesn't DPA-2 support multi-card operation?

iProzd · 2024-04-19T09:56:28Z

@wangyi01 To help us locate your problem, please provide the following information if possible:

Your code version
The exact finetuning command you used
The input files you provided
The error log or any other relevant output

DPA-2 indeed supports multi-card operation, see here. When doing multi-GPU training, you should use the torchrun command like this:

torchrun --no_python --nproc_per_node=$KUBERNETES_CONTAINER_RESOURCE_GPU --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT --nnodes=$WORLD_SIZE --node_rank=$RANK dp --pt train input.json --skip-neighbor-stat

However, without the additional information we requested, it will be difficult for us to identify the specific issue you're facing. Please share the details, and we'll be happy to assist you further.

wangyi01 · 2024-04-19T10:44:22Z

step1-finetune (2).zip

wangyi01 · 2024-04-20T13:26:01Z

@wangyi01 To help us locate your problem, please provide the following information if possible:

Your code version

The exact finetuning command you used

The input files you provided

The error log or any other relevant output

DPA-2 indeed supports multi-card operation, see here. When doing multi-GPU training, you should use the torchrun command like this:
torchrun --no_python --nproc_per_node=$KUBERNETES_CONTAINER_RESOURCE_GPU --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT --nnodes=$WORLD_SIZE --node_rank=$RANK dp --pt train input.json --skip-neighbor-stat
However, without the additional information we requested, it will be difficult for us to identify the specific issue you're facing. Please share the details, and we'll be happy to assist you further.

I used torchrun to implement multi-GPU invocation, but there was a conflict between the -m parameter inside and the -m parameter in the dp --pt parameter. Specifically, I used: torchrun --no_python --nproc_per_node=1 --nnode=4 dp --pt train input.json --finetune ./pretrained_model.pt -m Domains_OC2M --skip-neighbor-stat. I encountered the following error message: error: argument -m/--mpi-log: invalid choice: 'Domains_OC2M' (choose from 'master', 'collect', 'workers').

iProzd · 2024-04-30T08:10:00Z

See discussion #3689

wangyi01 added the wontfix label Apr 19, 2024

iProzd removed the wontfix label Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPA-2 does not support multi-card invocation. #3691

DPA-2 does not support multi-card invocation. #3691

wangyi01 commented Apr 19, 2024 •

edited

iProzd commented Apr 19, 2024 •

edited

wangyi01 commented Apr 19, 2024

wangyi01 commented Apr 20, 2024 •

edited

iProzd commented Apr 30, 2024

DPA-2 does not support multi-card invocation. #3691

DPA-2 does not support multi-card invocation. #3691

Comments

wangyi01 commented Apr 19, 2024 • edited

Summary

Details

iProzd commented Apr 19, 2024 • edited

wangyi01 commented Apr 19, 2024

wangyi01 commented Apr 20, 2024 • edited

iProzd commented Apr 30, 2024

wangyi01 commented Apr 19, 2024 •

edited

iProzd commented Apr 19, 2024 •

edited

wangyi01 commented Apr 20, 2024 •

edited