Skip to content

drigoni/ComparisonsDGM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ComparisonsDGM

This repository contains the code used to generate the results reported in the paper: A Systematic Assessment of Deep Learning Models for Molecule Generation.

@article{rigoni2020systematic,
  title={A Systematic Assessment of Deep Learning Models for Molecule Generation},
  author={Rigoni, Davide and Navarin, Nicol{\`o} and Sperduti, Alessandro},
  journal={arXiv preprint arXiv:2008.09168},
  year={2020}
}

Overview

The folders whose names begin with the "_" character contain all the code and useful files to test all the models. The others are the folders that contain the original models. For more information about each model, you need to view the READMI.md file within each folder.

Dependencies

This project uses the conda environment. In the _environment folder you can find, for each model, the .yml file for the configuration of the conda environment and also the .txt files for the pip environment. Note that some versions of the dependencies can generate problems in the configuration of the environment. For this reason, although the setup.bash file is present for the configuration of each project, it is better to configure them manually.

NOTE: some environments could be set only to use CPU. In this case if you want to use the GPU you need to change the tensorflow line in the environment file with tensorflow-gpu. Depending on the model, some lines of code must also be changed.

Structure

The project is structured as follows:

  • _analysis: contains the code to execute to test the molecules generated by the models. It also contains the code to analyze the datasets;
  • _datasets: contains the datasets QM9 and ZINC;
  • _environments: contains the file setup.bash used to configure each environment;
  • _utils: contains all the utility code;
  • gVAE: contains both Character VAE and Grammar VAE code;
  • sdVAE: contains the Syntax Directed VAE code;
  • molGAN: contains the MolGAN code;
  • rGVAE: contains the Regularized Graph VAE code;
  • jtVAE: contains the Junction Tree VAE code;
  • constrainedGVAE: contains the Constrained Graph VAE code.

Usage

Data Download

First you need to download the necessary files by running the following commands:

cd _dataset/QM9
sh download_dataset.sh

The test set is formed, for both data sets, by the first 5000 molecules. Since each model can use a different validation procedure, the choice of how to divide the remaining molecules into validations and training sets is left to the model in question, according to the code used by the author of the model.

Model Training

For training and molecule generation it is necessary to execute the model code in the appropriate folders. For new models, remember to add the reading and saving functions of the moelcules accordingly to the implementation reported in the currently present models. Within each model folder there is a README.md file that link the original repository of the code. Refer to the original repository for the commands to be used to train the models.

Model Test

Once the molecules have been generated with a model and saved in the molecules.txt file, you can use the files in the _analysis/models folder to calculate the they're properties. File model_results_generation.py analyzes the molecules generated by sampling from the laten space ne wmolecules, while model_results_bias.py performs the reconstruction analysis.

Example given $path the full path to the ComparisonsDGM folder, and $my_folder the name of the folder where to save the results:

conda activate analysis
cd _analysis/models
python model_results_bias.py $my_folder $path/gVAE/results/qm9_vae_str_L56_E100_val_decRes.txt qm9
python model_results_bias.py $my_folder $path/gVAE/results/zinc_vae_str_L56_E100_val_decRes.txt zinc

The results will be reported in folder $path/_analysis/models/[bias, generation]/$my_folder/.

Information

For any questions and comments, contact Davide Rigoni.

Licenze

MIT