Skip to content

zqfang/drugai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DrugAI

Drug Efficacy Prediction using Graph Neural Network

GNN

We implemented 3 GCN model for efficacy prediction:

Dependency

  • numpy
  • pandas
  • python >= 3.7
  • Pytorch >= 1.5
  • Pytorch Geometric >= 1.7
  • RDkit
  • optuna: (optional) hyperparameter search

Usage

Input data

Need at least one file

  1. a csv file with
  • first column: SIMILES
  • second to last column: float (regression), int (classification)
  1. (optional) a pickle file with a tuple (train, test, val) of row indices (splits).

1. Train

python drug_gnn/train.py --data_path ${data} \
                        --task ${regression} \
                        --gnn_type dmpnn --log_dir checkpoints/dmpnn 

2. Predict

python drug_gnn/predict.py --data_path ${data} \
                        --task ${regression} \
                        --gnn_type dmpnn --log_dir checkpoints/dmpnn 

Hyperparameter tuning

python drug_gnn/hyperopt.py --data_path ${data} --task ${regression}  \
                            --gnn_type dmpnn \
                            --hyperopt_dir hyper_dmpnn

Drug Efficacy prediction

  1. Train your model using LINCS 2020 Data

    • input data format:
      • shape: (num_smiles, num_landmark_genes)
      • first column are SMILE strings
      • the rest columns are expression values
      • columns names should be Entrez IDs
    • save best_model
  2. Prediction step will generate two output file

    • Embeddings for each molecule: xxx.embeddings.npy
    • Predicted Landmark genes expression: xxx.pred.exprs.csv
  3. Efficacy Score:

      1. Prepare a up- or down-regulated gene signatures (Entrez ID only): up.txt, down.txt
      1. Get transform matrix: GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx
         # convert to pandas DataFrame
         from cmapPy.pandasGEXpress.parse import parse
         weight = parse('GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx').data_df
         weight.to_csv("GSE92743_Broad_OLS_WEIGHTS_n979x11350.csv")
      1. Predicted output from step 2
      1. Run:
    python efficacy.py --weights GSE92743_Broad_OLS_WEIGHTS_n979x11350.csv \
                       --predicts xxx.pred.exprs.csv \
                       --up up.txt \
                       --down down.txt \
                       --output efficacy.csv

Results

  1. Average pearson's correlation (AUC-like plot) shows GNN works pretty good for predicting transcriptional profiles
    auc

  2. TSNE plot of drug's embeddings

  1. Pearson's coefficiency distribution

dist

Contact

Zhuoqing Fang: fangzq@stanford.edu

Others

This project is based on chemprop, and chiral_gnn

About

Graph Neural Networks for Drug Efficacy Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages