Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: Trainer.__init__() got an unexpected keyword argument 'gpus' #93

Open
RupakBiswas-2304 opened this issue Mar 29, 2023 · 11 comments

Comments

@RupakBiswas-2304
Copy link

[2023-03-30 02:45:03,037][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.ModelCheckpoint>
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/rupax/github/state-spaces/train.py", line 553, in main
    train(config)
  File "/home/rupax/github/state-spaces/train.py", line 496, in train
    trainer = create_trainer(config)
  File "/home/rupax/github/state-spaces/train.py", line 485, in create_trainer
    trainer = pl.Trainer(
  File "/home/rupax/github/state-spaces/venv/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py", line 69, in insert_env_defaults
    return fn(self, **kwargs)
TypeError: Trainer.__init__() got an unexpected keyword argument 'gpus'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.

How to fix this ? I am getting this same issue from 2 different machine

  • 1 Windows 11 , python3.10, cuda11.7, torch 2.0.0 + cuda
  • 2 Endovour OS, python3.10, no GPU, torch 2.0.0
@albertfgu
Copy link
Contributor

This repo hasn't been updated to Pytorch 2.0 yet. This problem is due to an earlier change in pytorch-lightning where the trainer.gpus argument was deprecated in favor of trainer.devices. The latest release of this should at least address this. If you're still seeing issues, can you paste the command line you ran?

@RupakBiswas-2304
Copy link
Author

I tried with torch==1.13.1+cu117 but got the same set of issues,
also, I am using the simple branch, python -m train experiment=sashimi-youtubemix wandb=null running this command,
gpus, progress_bar_refresh_rate, weights_summary ; commenting out this fields seems to be a temporary fix.

@albertfgu
Copy link
Contributor

It is due to the pytorch-lightning version. If you update to the latest code and use the version in the requirements.txt (pytorch-lightning==1.9.3) it should work.

@albertfgu
Copy link
Contributor

Did this fix the error?

@RupakBiswas-2304
Copy link
Author

No, but I removed( commented out ) some lines from the config file and now its is running without error, although my gpu runs out of memeory everytime I try to train the model, that's some different issue I guess.

@albertfgu
Copy link
Contributor

Are you on the latest commit? If there is a config in the repo that does not work out-of-the-box on the latest commit and with the provided requirements file, could you indicate which experiment config it is?

@RupakBiswas-2304
Copy link
Author

To make sure, I am not running with my changes, I cloned the main branch ( latest commit ) again, and try to train with
python -m train pipeline=mnist dataset.permute=True model=s4 model.n_layers=3 model.d_model=128 model.norm=batch model.prenorm=True wandb=null this command [ copied from readme ]
And this is the error I get
image

Actually, I was studying the SaShiMi model and trying to train with
python -m train experiment=audio/sashimi-sc09 model.n_layers=2 trainer.limit_train_batches=0.1 trainer.limit_val_batches=0.1 [ again copied from readme ]
and this is the error
image

I tried ( in another clone ) the suggested thread and spawn options, and after that config error appeared.

This are my torch version

torch==1.13.1+cu117
torchaudio==0.13.1
torchmetrics==0.11.4
torchtext==0.14.1
torchvision==0.14.1+cu117

@albertfgu
Copy link
Contributor

That is strange. What OS and GPU are you using? Your first error reminds me of something I saw when I tried running the repo locally on my Macbook, and I never figured out the issue. The second issue seems related to package versions. What version of pytorch-lightning are you on?

@RupakBiswas-2304
Copy link
Author

I am using

pytorch-lightning==1.9.3
torch==1.13.1+cu117

OS: Windows 11 and Ubuntu wsl
GPU : Nvidia RTX 3060 6 gb
CPU : Rygen9
NVCC :

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:59:34_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

@albertfgu
Copy link
Contributor

Unfortunately I don't know the answer to this. My guess is that it's likely to be related to one of these packages + environment. I recently set up this repo in a new environment (standard Linux + CUDA + A100 GPU) and it works well.

To isolate the problem I think it might be useful to try a minimal setup with Pytorch Lightning + wandb; these packages should have examples of a setting up a basic LightningTrainer for MNIST. I suspect you might run into the same issues.

@Iron-LYK
Copy link

I fix it with "pip install pytorch-lightning==1.7.2"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants