acceerate launch train.py, parallel train too slowly. it seems that accelerate model parallel not successsful. #17

lileishitou · 2023-08-08T09:21:40Z

how to use command "accelerate config" to generate the /home/duser/.cache/huggingface/accelerate/default_config.yaml.

I have try the two configuration that could run accelrate launch train.py successfully. but seems the model parallel training not successsfully. So how to configure the accelerate.

(1)

Which type of machine are you using? This machine
multi-GPU
How many different machines will you use (use more than 1 for multi-node training)? [1]:
Do you wish to optimize your script with torch dynamo?[yes/NO]:No
Do you want to use DeepSpeed? [yes/NO]: No
Do you want to use FullyShardedDataParallel? [yes/NO]: NO^H^H
Please enter yes or no.
Do you want to use FullyShardedDataParallel? [yes/NO]:
Do you want to use Megatron-LM ? [yes/NO]: NO
How many GPU(s) should be used for distributed training? [1]:5
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:1,2,3,4,5
Do you wish to use FP16 or BF16 (mixed precision)?
bf16
accelerate configuration saved at /home/duser/.cache/huggingface/accelerate/default_config.yaml

(2)
-In which compute environment are you running?
This machine
Which type of machine are you using?
multi-GPU
How many different machines will you use (use more than 1 for multi-node training)? [1]:
Do you wish to optimize your script with torch dynamo?[yes/NO]:n^HNO
Please enter yes or no.
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
Do you want to use FullyShardedDataParallel? [yes/NO]: yes
What should be your sharding strategy?
FULL_SHARD
Do you want to offload parameters and gradients to CPU? [yes/NO]: yes
What should be your auto wrap policy?
NO_WRAP
-What should be your FSDP's backward prefetch policy?
BACKWARD_PRE
What should be your FSDP's state dict type?
FULL_STATE_DICT
How many GPU(s) should be used for distributed training? [1]:5
Do you wish to use FP16 or BF16 (mixed precision)?
no
accelerate configuration saved at /home/duser/.cache/huggingface/accelerate/default_config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acceerate launch train.py, parallel train too slowly. it seems that accelerate model parallel not successsful. #17

acceerate launch train.py, parallel train too slowly. it seems that accelerate model parallel not successsful. #17

lileishitou commented Aug 8, 2023

acceerate launch train.py, parallel train too slowly. it seems that accelerate model parallel not successsful. #17

acceerate launch train.py, parallel train too slowly. it seems that accelerate model parallel not successsful. #17

Comments

lileishitou commented Aug 8, 2023

(1)