DeepSpeed Zero2 & 3 not working as expected #263

JinxuXiang · 2024-04-27T17:42:09Z

Has anyone tried using zero2 or zero3 in training? It seems this deepspeed does not work on my machine. I want to train the video with a very large resolution, which will cause an 80G GPU memory to be unable to train one batch of data. I hope that my 8 GPUs can be used together to train 2 batches or 4 batches of data.

I have done experiments using data with smaller resolutions. No matter how many GPUs are used, the batch size cannot be reduced below the number of GPUs, and the GPU memory usage is always the same after using zero2 or 3 when different number of GPUs used.

Thanks！

LinB203 · 2024-04-28T05:04:10Z

The default training is on zero2 mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed Zero2 & 3 not working as expected #263

DeepSpeed Zero2 & 3 not working as expected #263

JinxuXiang commented Apr 27, 2024

LinB203 commented Apr 28, 2024

DeepSpeed Zero2 & 3 not working as expected #263

DeepSpeed Zero2 & 3 not working as expected #263

Comments

JinxuXiang commented Apr 27, 2024

LinB203 commented Apr 28, 2024