Is MolGAN incompatible with max_atoms=31 or ZINC250k dataset #3962

UmarZein · 2024-04-25T12:21:52Z

❓ Questions & Help

I have replicated 90% of https://deepchem.io/tutorials/generating-molecules-with-molgan/

I assume it's more of a problem on the model/architecture rather than the library's implementation, but if anyone has used this library's MolGAN to fit an at least equally complex dataset please say so

with the differences being:

i am using ZINC250k https://raw.githubusercontent.com/aspuru-guzik-group/chemical_vae/master/models/zinc_properties/250k_rndm_zinc_drugs_clean_3.csv
num_atoms=31 (95% of the molecules in the dataset has at most 31 atoms)
atom_labels=[0, 6, 7, 8, 9, 15, 16, 17, 35, 53]
tried MolGAN(learning_rate=ExponentialDecay(0.0001, 0.9, 5000), vertices=max_atoms, model_dir="model_dir")
tried MolGAN(learning_rate=ExponentialDecay(0.001, 0.9, 5000), vertices=max_atoms, model_dir="model_dir")
code to fit gan.fit_gan(iterbatches(25), generator_steps=0.2, checkpoint_interval=1000, max_checkpoints_to_keep=25, restore=False)
dataset.X.shape is (235043, 31, 31)
dataset.y.shape is (235043, 31)

because when i fit the model to the data, the generator loss keeps growing:

epoch-0
Ending global_step 999: generator average loss 11.9274, discriminator average loss -23.6026
Ending global_step 1999: generator average loss 22.0532, discriminator average loss -44.0165

  4%|▍         | 1/25 [01:09<27:49, 69.55s/it]

epoch-1
Ending global_step 2999: generator average loss 31.1893, discriminator average loss -62.301
Ending global_step 3999: generator average loss 40.2527, discriminator average loss -80.4343

  8%|▊         | 2/25 [02:13<25:21, 66.16s/it]

epoch-2
Ending global_step 4999: generator average loss 49.3043, discriminator average loss -98.5299
Ending global_step 5999: generator average loss 58.3535, discriminator average loss -116.639
Ending global_step 6999: generator average loss 67.4018, discriminator average loss -134.74

 12%|█▏        | 3/25 [03:20<24:24, 66.59s/it]

epoch-3
Ending global_step 7999: generator average loss 76.4501, discriminator average loss -152.837
Ending global_step 8999: generator average loss 85.4984, discriminator average loss -170.934

 16%|█▌        | 4/25 [04:25<23:02, 65.82s/it]

epoch-4
Ending global_step 9999: generator average loss 94.5468, discriminator average loss -189.033
Ending global_step 10999: generator average loss 103.595, discriminator average loss -207.131

 20%|██        | 5/25 [05:30<21:56, 65.83s/it]

epoch-5
Ending global_step 11999: generator average loss 112.644, discriminator average loss -225.23
Ending global_step 12999: generator average loss 121.692, discriminator average loss -243.326
Ending global_step 13999: generator average loss 130.741, discriminator average loss -261.422

 24%|██▍       | 6/25 [06:41<21:19, 67.36s/it]

epoch-6
Ending global_step 14999: generator average loss 139.789, discriminator average loss -279.521
Ending global_step 15999: generator average loss 148.838, discriminator average loss -297.618

 28%|██▊       | 7/25 [07:45<19:51, 66.21s/it]

epoch-7
Ending global_step 16999: generator average loss 157.886, discriminator average loss -315.715
Ending global_step 17999: generator average loss 166.935, discriminator average loss -333.813

The text was updated successfully, but these errors were encountered:

rbharath · 2024-04-25T17:40:22Z

CC @shreyasvinaya who may have some insights

shreyasvinaya · 2024-04-26T15:41:15Z

Hi @UmarZein
The Losses reported on the terminal are actually the sum and not the loss of the step, it is relatively harmless, please run it completely and let me know if it runs successfully

UmarZein · 2024-04-30T07:17:03Z

I haven't been able to get a good run
Previously, it experienced mode collapse, but today i tried it again and it just failed generation completely
notebook: https://gist.github.com/UmarZein/d9613e44922b2cba4199f8175d3b89f2
I am not familiar with tensorflow: I am using runpod.io's Tensorflow image/template and there were a couple of red texts there:
when i import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))

2024-04-30 06:38:19.185759: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-30 06:38:19.185804: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-30 06:38:19.186730: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-30 06:38:19.291660: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

when running gan = MolGAN(learning_rate=ExponentialDecay(0.0001, 0.9, 5000), vertices=max_atoms, model_dir="model_dir")

2024-04-30 06:53:18.816166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 18452 MB memory:  -> device: 0, name: NVIDIA RTX 4000 Ada Generation, pci bus id: 0000:82:00.0, compute capability: 8.9

^after that line, it eats 18GB of VRAM (dunno if this is it's actual size or whether there is something wrong with the docker container)

rbharath · 2024-04-30T19:08:33Z

@shreyasvinaya Is the new torch molgan stable to run?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is MolGAN incompatible with max_atoms=31 or ZINC250k dataset #3962

Is MolGAN incompatible with max_atoms=31 or ZINC250k dataset #3962

UmarZein commented Apr 25, 2024 •

edited

rbharath commented Apr 25, 2024

shreyasvinaya commented Apr 26, 2024

UmarZein commented Apr 30, 2024

rbharath commented Apr 30, 2024

Is MolGAN incompatible with max_atoms=31 or ZINC250k dataset #3962

Is MolGAN incompatible with max_atoms=31 or ZINC250k dataset #3962

Comments

UmarZein commented Apr 25, 2024 • edited

❓ Questions & Help

rbharath commented Apr 25, 2024

shreyasvinaya commented Apr 26, 2024

UmarZein commented Apr 30, 2024

rbharath commented Apr 30, 2024

UmarZein commented Apr 25, 2024 •

edited