Running Baselines

Prerequisites

Make sure you generated the GDB9 data splits following the instructions in the README located in the root of this repository.
Initialize the repositories containing the code to run baselines by executing:
```
git submodule update --init
```
Setup environments for the different models:
- CGVAE and MolGAN: conda env create -n cgvae --file requirements_cgvae.yaml
- GrammarVAE: conda env create -n gvae --file requirements_grammarvae.yaml
- NeVAE: Use the same environment as for ALMGIG (see README located in the root of this repository).

Constrained Graph Variational Autoencoder (CGVAE)

Go to the CGVAE/data directory and update fname at the bottom of the get_qm9.py file to point to the generated splits of the GDB9 data.
Create JSON data:
```
python get_qm9.py
```
Go to the CGVAE directory and train the model:
```
python CGVAE.py --dataset qm9
```

Sample molecules:

python CGVAE.py --dataset qm9 \
    --restore "10_qm9.pickle" \
    --config '{"generation": true, "number_of_generation": 10000}'

The file generated_smiles_qm9.txt will contain generated molecules in SMILES format.

MolGAN

Go to the MolGAN/data directory and run:

wget https://github.com/gablg1/ORGAN/raw/master/organ/NP_score.pkl.gz
wget https://github.com/gablg1/ORGAN/raw/master/organ/SA_score.pkl.gz

Go to the MolGAN directory and generate the data:

python utils/sparse_molecular_dataset.py \
    --train "../../data/gdb9/graphs/gdb9_train.smiles" \
    --validation "../../data/gdb9/graphs/gdb9_valid.smiles" \
    --test "../../data/gdb9/graphs/gdb9_test.smiles" \
    --output "data/qm9-mysplits-data.pkl"

Train the model:
```
python example.py
```

Sample molecules

python predict.py \
    --model_dir "GraphGAN/norl/lam1/" \
    --number_samples 10000 \
    -o "generated_molecules.csv"

Generated molecules in SMILES format will be written to generated_molecules.csv.

NeVAE

Run the script get-gdb9-with-hydrogens.sh in the data directory located in the root of this repository.
Train the model by running train_nevae.sh. The script automatically samples a number of molecules once training was completed. Multiple CSV files with generated molecules in SMILES format will be located in the models/nevae-poisson-masked directory.

GrammarVAE

Go the data/gdb9/graphs folder at the root of this repository and concatenate all GDB9 data:
```
cat gdb9_test.smiles gdb9_train.smiles gdb9_valid.smiles > gdb9.smiles
```
Go to the grammarVAE directory, open the file make_gdb9_dataset_grammar.py and change f at the top of the file to point to gdb9.smiles created above, then run
```
python make_gdb9_dataset_grammar.py
```
Train the model
```
python train_gdb9.py
```
Sample molecules
```
python sample_gdb9.py
```
Generated SMILES strings will be written to gdb9-generated.smi. Note that generated strings can be invalid SMILES.

Random Graph Generation

To generate molecules randomly, while imposing valence constraints, run:

python generate_random.py --output random_samples.csv

Molecules in SMILES format will be written to random_samples.csv.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Running Baselines

Prerequisites

Constrained Graph Variational Autoencoder (CGVAE)

MolGAN

NeVAE

GrammarVAE

Random Graph Generation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Running Baselines

Prerequisites

Constrained Graph Variational Autoencoder (CGVAE)

MolGAN

NeVAE

GrammarVAE

Random Graph Generation