Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket conflict bug when running vanilla_vae with celeba dataset #65

Open
BWN133 opened this issue Aug 3, 2022 · 2 comments
Open

Socket conflict bug when running vanilla_vae with celeba dataset #65

BWN133 opened this issue Aug 3, 2022 · 2 comments

Comments

@BWN133
Copy link

BWN133 commented Aug 3, 2022

Hi, I am trying to run the vanilla_vae model with celeba dataset on my personal device but I am getting a weird error telling me that there is a socket conflict. The system will just hang after the error. Do some one has any idea how to solve this?
For more detail please refere to:
https://stackoverflow.com/questions/73215732/socket-conflict-while-running-vaes

Thanks!

@AntixK
Copy link
Owner

AntixK commented Aug 3, 2022

How does your config file look like?

If you are training on a single CPU, then simply set the gpu field in the config file empty or none.

@BWN133
Copy link
Author

BWN133 commented Aug 3, 2022

Thanks a lot for replying!!! My original configs looks like the following:

model_params:
  name: 'VanillaVAE'
  in_channels: 3
  latent_dim: 128


data_params:
  data_path: "Data/"
  train_batch_size: 64
  val_batch_size:  64
  patch_size: 64
  num_workers: 4


exp_params:
  LR: 0.005
  weight_decay: 0.0
  scheduler_gamma: 0.95
  kld_weight: 0.00025
  manual_seed: 1265

trainer_params:
  gpus: [0]
  max_epochs: 100

logging_params:
  save_dir: "logs/"
  name: "VanillaVAE"

I tried to change the gpus to null or just directly delete it, there will be new error says:

Traceback (most recent call last):
  File "C:\Users\huklab\Desktop\odin\PyTorch-VAE\run.py", line 46, in <module>
    data = VAEDataset(**config["data_params"], pin_memory=len(config['trainer_params']['gpus']) != 0)
TypeError: object of type 'NoneType' has no len()

I tried to delete the prerequest of ['gpus'] and it provides me with the exact same error I am having before ( I am training on a machine with only one GPU not CPU so i believe the config shouldn't be problem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants