Systematically Exploring QSAR Models for Activity-Cliff Prediction

Code to reproduce the experiments from the paper Systematically Exploring QSAR Models for Activity-Cliff Prediction. This repository also contains clean molecule- and MMP data for all three analysed data sets (dopamine receptor D2, factor Xa, SARS-CoV-2 main protease) as well as the original numerical results from the experiments conducted in the paper.

Data Sets

The data-folder contains three clean chemical data sets of small-molecule inhibitors of dopamine receptor D2, factor Xa, or SARS-CoV-2 main protease respectively. Each data set is represented by two files: molecule_data_clean.csv and MMP_data_clean.csv. The first file contains SMILES strings with associated activity values and the second file contains all matched molecular pairs (MMPs) identified within the first file.

Reproducing the Experiments

The experiments in the paper can be reproduced by running the code in the Jupyter notebook QSAR_activity_cliff_experiments.ipynb. First, the QSAR-, AC-, and PD-prediction tasks for the chosen data set are formally constructed in a data-preparation section. Then, an appropriate data split is conducted, both at the level of individual molecules and MMPs. Finally, a molecular representation (PDV, ECFP, or GIN) and a regression technique (RF, kNN, MLP) are chosen and the resulting model is trained and evaluated for QSAR-prediction, AC-classification and PD-classification. The computational environment in which the original results were conducted can be found in environment.yml.

Visually Investigating the Results:

The experimental results can be visually explored using the visualise_results-function at the end of QSAR_activity_cliff_experiments.ipynb. This function produces scatterplots such as the one in the graphical abstract above. The original numerical results from the paper are saved in the resuls-folder; thus the original plots from the paper (and more) can be generated with visualise_results.

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
bash_scripts		bash_scripts
data		data
modules		modules
results		results
LICENSE		LICENSE
README.md		README.md
data_prep.py		data_prep.py
e3fp_exp.py		e3fp_exp.py
ecfp_e3fp_plot.py		ecfp_e3fp_plot.py
ecfp_exp.py		ecfp_exp.py
ecfp_rf.txt		ecfp_rf.txt
environment.yml		environment.yml
gcn_exp.py		gcn_exp.py
gin_exp.py		gin_exp.py
gsn_exp.py		gsn_exp.py
pdv_exp.py		pdv_exp.py
requirements.txt		requirements.txt
smiles_fp_dict.txt		smiles_fp_dict.txt
svgtopdf.py		svgtopdf.py
visualise.py		visualise.py

License

arunraja-hub/GDL-ActivityCliff-3D

Folders and files

Latest commit

History

Repository files navigation

Systematically Exploring QSAR Models for Activity-Cliff Prediction

Data Sets

Reproducing the Experiments

Visually Investigating the Results:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages