What config relate to learning rate warm up, weight decay, and momentum in 1 node n GPUs (n > 1 && n < 8) config? #584

tungts1101 · 2024-01-30T05:38:58Z

❓ How to do something using VISSL

Describe what you want to do, including:

what I am trying to do: I have read the paper Imagenet-1hour. In there they mentioned the learning rate warm-up, weight decay, and momentum when implementing distributed training in 1 node multi gpus. However, I could not find any documents related to these configs. How could I properly set them?
what outputs you are expecting: A config and an explanation related to learning rate warm-up strategy, weight decay, and momentum in 1 node n gpus machine?

Please link to which API or documentation you're asking about from
https://github.com/facebookresearch/vissl/tree/main/docs