Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPA-2 does not support multi-card invocation. #3691

Open
wangyi01 opened this issue Apr 19, 2024 · 4 comments
Open

DPA-2 does not support multi-card invocation. #3691

wangyi01 opened this issue Apr 19, 2024 · 4 comments

Comments

@wangyi01
Copy link

wangyi01 commented Apr 19, 2024

Summary

When fine-tuning the first step model of DPA-2, I keep encountering out-of-memory errors. Even reducing the batch size and switching to GPUs with larger memory doesn't seem to work well.

Details

The memory of one card is not sufficient, so multiple cards were used for operation.I was trying to utilize multiple cards when using DPA-2, but in practice, only one card was being invoked, leading to insufficient memory. I have four GPUs with 16GB each. However, it indicates that the memory of the first GPU is insufficient, suggesting that only the first GPU is being utilized. So why doesn't DPA-2 support multi-card operation?

@iProzd
Copy link
Collaborator

iProzd commented Apr 19, 2024

@wangyi01 To help us locate your problem, please provide the following information if possible:

  • Your code version
  • The exact finetuning command you used
  • The input files you provided
  • The error log or any other relevant output

DPA-2 indeed supports multi-card operation, see here. When doing multi-GPU training, you should use the torchrun command like this:

torchrun --no_python --nproc_per_node=$KUBERNETES_CONTAINER_RESOURCE_GPU --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT --nnodes=$WORLD_SIZE --node_rank=$RANK dp --pt train input.json --skip-neighbor-stat

However, without the additional information we requested, it will be difficult for us to identify the specific issue you're facing. Please share the details, and we'll be happy to assist you further.

@iProzd iProzd removed the wontfix label Apr 19, 2024
@wangyi01
Copy link
Author

step1-finetune (2).zip

@wangyi01
Copy link
Author

wangyi01 commented Apr 20, 2024

@wangyi01 To help us locate your problem, please provide the following information if possible:

  • Your code version
  • The exact finetuning command you used
  • The input files you provided
  • The error log or any other relevant output

DPA-2 indeed supports multi-card operation, see here. When doing multi-GPU training, you should use the torchrun command like this:

torchrun --no_python --nproc_per_node=$KUBERNETES_CONTAINER_RESOURCE_GPU --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT --nnodes=$WORLD_SIZE --node_rank=$RANK dp --pt train input.json --skip-neighbor-stat

However, without the additional information we requested, it will be difficult for us to identify the specific issue you're facing. Please share the details, and we'll be happy to assist you further.

I used torchrun to implement multi-GPU invocation, but there was a conflict between the -m parameter inside and the -m parameter in the dp --pt parameter. Specifically, I used: torchrun --no_python --nproc_per_node=1 --nnode=4 dp --pt train input.json --finetune ./pretrained_model.pt -m Domains_OC2M --skip-neighbor-stat. I encountered the following error message: error: argument -m/--mpi-log: invalid choice: 'Domains_OC2M' (choose from 'master', 'collect', 'workers').

@iProzd
Copy link
Collaborator

iProzd commented Apr 30, 2024

See discussion #3689

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants