torch_xla compatibility option? #555

adamcatto · 2022-06-21T13:19:46Z

🚀 Feature

An option to train models on a TPU or TPU pod using the torch_xla package.

Motivation & Examples

Motivation: speed up training, utilize best available resources.

Example: in vissl/vissl/trainer/trainer_main.py, start by changing SelfSupervisionTrainer.setup_distributed(self, use_gpu) to something like SelfSupervisionTrainer.setup_distributed(self, device), then encapsulate TPU training setup if device == 'TPU', or something along these lines. Relevant changes to other functions can be made afterwards.

(Note: I will likely start working on this; I am new to VISSL, so I figure a regular contributor might be better-equipped to handle this, but I can give it a go nonetheless.)

The text was updated successfully, but these errors were encountered:

QuentinDuval · 2022-06-27T18:23:19Z

Hey @adamcatto,

Thanks a lot for raising the point :)

So to be fair, we did have a look last year at PyTorch XLA to see if we could get something out of it, but did not do so for several reasons: PyTorch/XLA was still relatively new and at that time we trained ConvNets on which GPUs are actually pretty good. But now that Visual Transformers are there in the codebase, it might indeed be worth looking into.

I am however for the moment not qualified enough with PyTorch/XLA and the TPU ecosystem to proceed with such changes (I understood that to run on TPU, it's more than just changing the device, the data loader, as well as other things such as the way to save the model or fetch data, or even just run jobs on GCP would have to be integrated). It is however part of my personal goals to play with those technologies, so that might change.

If you feel qualified on this, we can start to discuss what would need to be changed, what kind of test case you would like to move forward first, etc.

What do you think?
Quentin

QuentinDuval added the awaiting-user-response label Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch_xla compatibility option? #555

torch_xla compatibility option? #555

adamcatto commented Jun 21, 2022

QuentinDuval commented Jun 27, 2022

torch_xla compatibility option? #555

torch_xla compatibility option? #555

Comments

adamcatto commented Jun 21, 2022

🚀 Feature

Motivation & Examples

QuentinDuval commented Jun 27, 2022