Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is MolGAN incompatible with max_atoms=31 or ZINC250k dataset #3962

Open
UmarZein opened this issue Apr 25, 2024 · 4 comments
Open

Is MolGAN incompatible with max_atoms=31 or ZINC250k dataset #3962

UmarZein opened this issue Apr 25, 2024 · 4 comments

Comments

@UmarZein
Copy link

UmarZein commented Apr 25, 2024

❓ Questions & Help

I have replicated 90% of https://deepchem.io/tutorials/generating-molecules-with-molgan/

I assume it's more of a problem on the model/architecture rather than the library's implementation, but if anyone has used this library's MolGAN to fit an at least equally complex dataset please say so

with the differences being:

  • i am using ZINC250k https://raw.githubusercontent.com/aspuru-guzik-group/chemical_vae/master/models/zinc_properties/250k_rndm_zinc_drugs_clean_3.csv
  • num_atoms=31 (95% of the molecules in the dataset has at most 31 atoms)
  • atom_labels=[0, 6, 7, 8, 9, 15, 16, 17, 35, 53]
  • tried MolGAN(learning_rate=ExponentialDecay(0.0001, 0.9, 5000), vertices=max_atoms, model_dir="model_dir")
  • tried MolGAN(learning_rate=ExponentialDecay(0.001, 0.9, 5000), vertices=max_atoms, model_dir="model_dir")
  • code to fit gan.fit_gan(iterbatches(25), generator_steps=0.2, checkpoint_interval=1000, max_checkpoints_to_keep=25, restore=False)
  • dataset.X.shape is (235043, 31, 31)
  • dataset.y.shape is (235043, 31)

because when i fit the model to the data, the generator loss keeps growing:

epoch-0
Ending global_step 999: generator average loss 11.9274, discriminator average loss -23.6026
Ending global_step 1999: generator average loss 22.0532, discriminator average loss -44.0165

  4%|▍         | 1/25 [01:09<27:49, 69.55s/it]

epoch-1
Ending global_step 2999: generator average loss 31.1893, discriminator average loss -62.301
Ending global_step 3999: generator average loss 40.2527, discriminator average loss -80.4343

  8%|▊         | 2/25 [02:13<25:21, 66.16s/it]

epoch-2
Ending global_step 4999: generator average loss 49.3043, discriminator average loss -98.5299
Ending global_step 5999: generator average loss 58.3535, discriminator average loss -116.639
Ending global_step 6999: generator average loss 67.4018, discriminator average loss -134.74

 12%|█▏        | 3/25 [03:20<24:24, 66.59s/it]

epoch-3
Ending global_step 7999: generator average loss 76.4501, discriminator average loss -152.837
Ending global_step 8999: generator average loss 85.4984, discriminator average loss -170.934

 16%|█▌        | 4/25 [04:25<23:02, 65.82s/it]

epoch-4
Ending global_step 9999: generator average loss 94.5468, discriminator average loss -189.033
Ending global_step 10999: generator average loss 103.595, discriminator average loss -207.131

 20%|██        | 5/25 [05:30<21:56, 65.83s/it]

epoch-5
Ending global_step 11999: generator average loss 112.644, discriminator average loss -225.23
Ending global_step 12999: generator average loss 121.692, discriminator average loss -243.326
Ending global_step 13999: generator average loss 130.741, discriminator average loss -261.422

 24%|██▍       | 6/25 [06:41<21:19, 67.36s/it]

epoch-6
Ending global_step 14999: generator average loss 139.789, discriminator average loss -279.521
Ending global_step 15999: generator average loss 148.838, discriminator average loss -297.618

 28%|██▊       | 7/25 [07:45<19:51, 66.21s/it]

epoch-7
Ending global_step 16999: generator average loss 157.886, discriminator average loss -315.715
Ending global_step 17999: generator average loss 166.935, discriminator average loss -333.813
@rbharath
Copy link
Member

CC @shreyasvinaya who may have some insights

@shreyasvinaya
Copy link
Member

Hi @UmarZein
The Losses reported on the terminal are actually the sum and not the loss of the step, it is relatively harmless, please run it completely and let me know if it runs successfully

@UmarZein
Copy link
Author

I haven't been able to get a good run
Previously, it experienced mode collapse, but today i tried it again and it just failed generation completely
notebook: https://gist.github.com/UmarZein/d9613e44922b2cba4199f8175d3b89f2
I am not familiar with tensorflow: I am using runpod.io's Tensorflow image/template and there were a couple of red texts there:
when i import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))

2024-04-30 06:38:19.185759: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-30 06:38:19.185804: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-30 06:38:19.186730: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-30 06:38:19.291660: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

when running gan = MolGAN(learning_rate=ExponentialDecay(0.0001, 0.9, 5000), vertices=max_atoms, model_dir="model_dir")

2024-04-30 06:53:18.816166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 18452 MB memory:  -> device: 0, name: NVIDIA RTX 4000 Ada Generation, pci bus id: 0000:82:00.0, compute capability: 8.9

^after that line, it eats 18GB of VRAM (dunno if this is it's actual size or whether there is something wrong with the docker container)

@rbharath
Copy link
Member

@shreyasvinaya Is the new torch molgan stable to run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants