DrugAI

Drug Efficacy Prediction using Graph Neural Network

We implemented 3 GCN model for efficacy prediction:

DMPNN
GCN
GIN, see also GINEConv

Dependency

numpy
pandas
python >= 3.7
Pytorch >= 1.5
Pytorch Geometric >= 1.7
RDkit
optuna: (optional) hyperparameter search

Usage

Input data

Need at least one file

a csv file with

first column: SIMILES
second to last column: float (regression), int (classification)

(optional) a pickle file with a tuple (train, test, val) of row indices (splits).

1. Train

python drug_gnn/train.py --data_path ${data} \
                        --task ${regression} \
                        --gnn_type dmpnn --log_dir checkpoints/dmpnn

2. Predict

python drug_gnn/predict.py --data_path ${data} \
                        --task ${regression} \
                        --gnn_type dmpnn --log_dir checkpoints/dmpnn

Hyperparameter tuning

python drug_gnn/hyperopt.py --data_path ${data} --task ${regression}  \
                            --gnn_type dmpnn \
                            --hyperopt_dir hyper_dmpnn

Drug Efficacy prediction

Train your model using LINCS 2020 Data
- input data format:
  - shape： (num_smiles, num_landmark_genes)
  - first column are SMILE strings
  - the rest columns are expression values
  - columns names should be Entrez IDs
- save best_model
Prediction step will generate two output file
- Embeddings for each molecule: xxx.embeddings.npy
- Predicted Landmark genes expression: xxx.pred.exprs.csv

Efficacy Score:

1. Prepare a up- or down-regulated gene signatures (Entrez ID only): up.txt, down.txt

Get transform matrix: GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx

   # convert to pandas DataFrame
   from cmapPy.pandasGEXpress.parse import parse
   weight = parse('GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx').data_df
   weight.to_csv("GSE92743_Broad_OLS_WEIGHTS_n979x11350.csv")

1. Predicted output from step 2
1. Run:

python efficacy.py --weights GSE92743_Broad_OLS_WEIGHTS_n979x11350.csv \
                   --predicts xxx.pred.exprs.csv \
                   --up up.txt \
                   --down down.txt \
                   --output efficacy.csv

Results

Average pearson's correlation (AUC-like plot) shows GNN works pretty good for predicting transcriptional profiles
TSNE plot of drug's embeddings

Pearson's coefficiency distribution

Contact

Zhuoqing Fang: fangzq@stanford.edu

Others

This project is based on chemprop, and chiral_gnn

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
drug_gnn		drug_gnn
notebook/assets		notebook/assets
.gitignore		.gitignore
Compounds.md		Compounds.md
LINCS2020.md		LINCS2020.md
README.md		README.md
efficacy.py		efficacy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drug_gnn

drug_gnn

notebook/assets

notebook/assets

.gitignore

.gitignore

Compounds.md

Compounds.md

LINCS2020.md

LINCS2020.md

README.md

README.md

efficacy.py

efficacy.py

Repository files navigation

DrugAI

Drug Efficacy Prediction using Graph Neural Network

Dependency

Usage

Input data

1. Train

2. Predict

Hyperparameter tuning

Drug Efficacy prediction

Results

Contact

Others

About

Releases

Packages

Languages

zqfang/drugai

Folders and files

Latest commit

History

Repository files navigation

DrugAI

Drug Efficacy Prediction using Graph Neural Network

Dependency

Usage

Input data

1. Train

2. Predict

Hyperparameter tuning

Drug Efficacy prediction

Results

Contact

Others

About

Resources

Stars

Watchers

Forks

Languages