Skip to content

MLlab4CS/DeepGLSTM

Repository files navigation

DeepGLSTM: Deep Graph Convolutional Network and LSTM based approach for predicting drug-target binding affinity

Quick Links

  1. Abstract
  2. Model Architecture
  3. Preparation
    1. Environment Setup
    2. Dataset description
  4. Quick Start
    1. Create Dataset
    2. Model Training
    3. Inference on Pretrained Model
  5. Pretrained Models and Dataset
    1. Pretrained Models download links
    2. Dataset download links
  6. Model Performance Stats
  7. Case studies on SARS-CoV-2 viral proteins
  8. Citation

Abstract

Development of new drugs is an expensive and time-consuming process. Due to the world-wide SARS-CoV-2 outbreak, it is essential that new drugs for SARS-CoV-2 are developed as soon as possible. Drug repurposing techniques can reduce the time span needed to develop new drugs by probing the list of existing FDA-approved drugs and their properties to reuse them for combating the new disease. We propose a novel architecture DeepGLSTM, which is a Graph Convolutional network and LSTM based method that predicts binding affinity values between the FDA-approved drugs and the viral proteins of SARS-CoV-2. Our proposed model has been trained on Davis, KIBA (Kinase Inhibitor Bioactivity), DTC (Drug Target Commons), Metz, ToxCast and STITCH datasets. We use our novel architecture to predict a Combined Score (calculated using Davis and KIBA score) of 2,304 FDA-approved drugs against 5 viral proteins. On the basis of the Combined Score, we prepare a list of the top-18 drugs with the highest binding affinity for 5 viral proteins present in SARS-CoV-2. Subsequently, this list may be used for the creation of new useful drugs. For more details please visit our work.

Model Architecture

alt text

Preparation

Environment Setup

The dependency pakages can be installed using the command.

pip install -r requirements.txt

Dataset description

In our experiment we use Davis, Kiba, DTC, Metz, ToxCast, Stitch datasets respectively.

Dataset Statistics:

alt text

Quick Start

Create Dataset

Firstly, run the script below to create Pytorch_Geometric file. The file will be created in processed folder in data folder.

python3 data_creation.py 

Default values of argument parser are set for davis dataset.

Model Training

Run the following script to train the model.

python3 training.py 

Default values of argument parser are set for davis dataset.

Inference on Pretrained Model

Run the following script to test the model.

python3 inference.py 

Default values of argument parser are set for davis dataset.

Pretrained Models and Dataset

Pretrained Models download links

Dataset Model download link
Davis Link
Kiba Link
DTC Link
Metz Link
ToxCast Link
Stitch Link

Download models from the above table for particular dataset and store in the pretrained_model folder.

Dataset download links

Dataset Dataset download links
Davis Link
Kiba Link
DTC Link
Metz Link
ToxCast Link
Stitch Link

Download dataset from the above table for particular data and store in the data folder. For each folder in the link there are two csv file train and test.

Model Performance Stats

alt text

Plots showing DeepGLSTM versus measured binding affinity values for the (a) Davis dataset (b) KIBA dataset (c) DTC dataset (d) Metz dataset (e) ToxCast dataset (f) STITCH dataset. In figure Coef_V is Pearson correlation coefficient.

Case studies on SARS-CoV-2 viral proteins

alt text alt text

Citation

Please cite our paper if it's helpful to you in your research.

@inbook{doi:10.1137/1.9781611977172.82,
author = {Shrimon Mukherjee and Madhusudan Ghosh and Partha Basuchowdhuri},
title = {DeepGLSTM: Deep Graph Convolutional Network and LSTM based approach for predicting drug-target binding affinity},
booktitle = {Proceedings of the 2022 SIAM International Conference on Data Mining (SDM)},
chapter = {},
pages = {729-737},
doi = {10.1137/1.9781611977172.82},
URL = {https://epubs.siam.org/doi/abs/10.1137/1.9781611977172.82},
eprint = {https://epubs.siam.org/doi/pdf/10.1137/1.9781611977172.82},
    abstract = { Abstract Development of new drugs is an expensive and time-consuming process. Due to the world-wide SARS-CoV-2 outbreak, it is essential that new drugs for SARS-CoV-2 are developed as soon as possible. Drug repurposing techniques can reduce the time span needed to develop new drugs by probing the list of existing FDA-approved drugs and their properties to reuse them for combating the new disease. We propose a novel architecture DeepGLSTM, which is a Graph Convolutional network and LSTM based method that predicts binding affinity values between the FDA-approved drugs and the viral proteins of SARS-CoV-2. Our proposed model has been trained on Davis, KIBA (Kinase Inhibitor Bioactivity), DTC (Drug Target Commons), Metz, ToxCast and STITCH datasets. We use our novel architecture to predict a Combined Score (calculated using Davis and KIBA score) of 2,304 FDA-approved drugs against 5 viral proteins. On the basis of the Combined Score, we prepare a list of the top-18 drugs with the highest binding affinity for 5 viral proteins present in SARS-CoV-2. Subsequently, this list may be used for the creation of new useful drugs. }
}

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages