Skip to content

Repository for Distributed representations of graphs for drug pair scoring

License

Notifications You must be signed in to change notification settings

paulmorio/DrugPairScoringDR

Repository files navigation

DrugPairScoringDR

Arxiv

This is the source code repository for "Distributed representations of graphs for drug pair scoring". It also contains the distributed representations of molecular graphs learned over the drug sets in DrugCombDB, DrugComb, DrugbankDDI and TwoSides datasets.

Using our trained distributed representations of graphs

You can find our learned distributed representations in data/embeddings. Each of the embeddings are saved in .json files which contain a mapping from the drug to its distributed representation within the set.

Our naming convention is: <Dataset>_<SubstructurePattern>_<EmbeddingDimensionality>_<NumEpochsTrained>.json

Prerequisites and installation

There are several prerequisites to using our code, chief amongst them a modified version of ChemicalX called modded_chemicalx we include in this repository (all credit to original ChemicalX authors). This contains our DR augmented model implementations and augmented data loaders utilising the distributed representations and also reflects changes required due to changes in the APIs of the TorchDrug dependency. We include concrete installation instructions below.

We recommend installing packages in a conda virtual environment

conda create -n drvenv python=3.8
conda activate drvenv

We can install the required packages

CPU

conda install pytorch torchvision torchaudio cpuonly -c pytorch
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cpu.html
pip install torchdrug==0.1.2

Do a local installation of the modded_chemicalx package, followed by the Geo2DR package.

cd modded_chemicalx
pip install -e .
cd ..
pip install git+https://github.com/paulmorio/geo2dr.git

GPU

If you have a GPU you can follow these instructions

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cu102.html
pip install torchdrug==0.1.2

Do a local installation of the modded_chemicalx package, followed by the Geo2DR package.

cd modded_chemicalx
pip install -e .
cd ..
pip install git+https://github.com/paulmorio/geo2dr.git

Code overview

  • main.py: Script for running/evaluating DR-Augmented models. Automatically learns distributed representations if they are not available for given hyperparameters.
  • nondr_main.py: Script for running/evaluating current state-of-the-art models without distributed representations
  • results_analysis: Notebook containing summaries of the results files outputted by main.py and nondr_main.py. Contains plots, ablation study etc as reported in the paper.
  • train_dr_model.py: Utilities for training models with distributed representations and cho
  • data: Contains code related to inducing substructure patterns and learning distributed representations.
  • modded_chemicalx: A modified version of the ChemicalX package, necessitated by updates to the TorchDrug API, and new dataloaders/models utilising the distributed representations of graphs.

About

Repository for Distributed representations of graphs for drug pair scoring

Topics

Resources

License

Stars

Watchers

Forks