Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSpeed Zero2 & 3 not working as expected #263

Open
JinxuXiang opened this issue Apr 27, 2024 · 1 comment
Open

DeepSpeed Zero2 & 3 not working as expected #263

JinxuXiang opened this issue Apr 27, 2024 · 1 comment

Comments

@JinxuXiang
Copy link

Has anyone tried using zero2 or zero3 in training? It seems this deepspeed does not work on my machine. I want to train the video with a very large resolution, which will cause an 80G GPU memory to be unable to train one batch of data. I hope that my 8 GPUs can be used together to train 2 batches or 4 batches of data.

I have done experiments using data with smaller resolutions. No matter how many GPUs are used, the batch size cannot be reduced below the number of GPUs, and the GPU memory usage is always the same after using zero2 or 3 when different number of GPUs used.

Thanks!

@LinB203
Copy link
Member

LinB203 commented Apr 28, 2024

The default training is on zero2 mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants