QSARMPC_DTIMPC

Secure multiparty computation for privacy-preserving drug discovery.

Install

Clone this repository.

git clone https://github.com/rongma6/QSARMPC_DTIMPC.git
cd QSARMPC_DTIMPC

Download the PrivPy framework (Li and Xu, 2019) bazel-bin.tar.gz, put it under the QSARMPC_DTIMPC directory and unpack it.

tar -xzvf bazel-bin.tar.gz

Install the dependencies. This system is based on Python 3.6 and requires the following packages. On Ubuntu 18,

sudo apt-get install python3.6 python3-pip
# replace the default python with python 3.6
sudo rm -rf /usr/bin/python
sudo ln -s /usr/bin/python3.6 /usr/bin/python

sudo apt-get install libgmp3-dev libmpfr-dev libmpc-dev build-essential
sudo pip3 install numpy sklearn pandas absl-py pycryptodomex

Datasets

For both QSAR and DTI prediction tasks, the datasets for the MPC algorithms are in mydata. We provide both a full dataset for a typical experiment and a toy dataset for a quick start. Below we present how to prepare datasets, as a reference to generate new datasets.

For QSAR prediction, the full dataset is generated by preprocessing the METAB dataset in the Kaggle competition (Ma et al., 2015). The original datasets can be downloaded from https://pubs.acs.org/doi/abs/10.1021/ci500747n (ci500747n_si_002.zip in the Supporting Information section). The preprocessing code is in prep/QSAR_prep.py.

For DTI prediction, the full dataset is generated by preprocessing the dataset in DTINet (Luo et al., 2017). The preprocessing including publically computing features of proteins (prep/DTI_public.py), locally computing 1024-bit fingerprint vectors of drugs (prep/DTI_local.py), and randomly splitting training and test samples for evaluation (prep/DTI_valid.py). More detailedly, mydata/DTI_full/data_luo/mat_drug_disease.txt is the same file as mat_drug_disease.txt in the DTINet dataset (Luo et al., 2017). mydata/DTI_full/data_prep/public_protein_feature_800.npy is the public protein features with the dimension as 800, generated by

python prep/DTI_public.py 800 20 0.5

where 20 is the maximum number of iterations in RWR and 0.5 is the restart probability in RWR. mydata/DTI_full/data_prep/finger_rdkit_1024.npy is the local 1024-bit fingerprint vectors of drugs, generated by

python prep/DTI_local.py 1024

Files in mydata/DTI_full/trial1/ are the training and test datasets, generated by

python prep/DTI_valid.py 1 0 dense

where one can change the random seed, the fold id and the format of the training dataset. _train_dense stands for the training dataset, which is a dense matrix with ones representing the positive samples and zeros for other elements. Note that only the positive samples could affect the training process of DTIMPC, so set the format of the training dataset as dense. If other algorithms require negative samples, we provide the same number of negative samples as the positive samples when setting the format of the training dataset as sparse. _test1basic is corresponding to the test dataset with 1:1 positive and negative samples; _test1basic and _test9extra together are corresponding to the test dataset with 1:10 positive and negative samples; _test1basic and _testallextra together are corresponding to the test dataset with all samples. We ensure that in all these three settings, the test dataset consists of different samples from the training dataset. Here, we show the whole pipeline as the data are from one entity. In practice, the private data are owned by different clients.

Set configuration files

The data directory and the hyperparameters are set in configuration files. For example, set the running data directory as mydata/DTI_toy/ and the maximum number of iterations in privacy-preserving RWR for drugs as 20:

data_dir mydata/DTI_toy/
maxiterd 20

The configuration file for QSARMPC and DTIMPC are conf/QSAR.conf and conf/DTI.conf, respectively. We provide an example of the configuration file for each dataset. You can copy and use them as configuration files. For example,

cp conf/QSAR_toy.conf conf/QSAR.conf
cp conf/DTI_toy.conf conf/DTI.conf

Or you can set your configuration files as you want.

IMPORTANT TIPS:

Always set configuration files conf/QSAR.conf and conf/DTI.conf BEFORE you run the MPC algorithms.
Note that the configuration files should include ALL items as in the examples of configuration files. And for the QSAR task, conf/QSAR.conf should include all these items IN ORDER.

Run

Open two terminal windows.

Run ./bazel-bin/run at the first window.
After about 10 seconds, run ./bazel-bin/client in the second window. Follow the prompts and run the corresponding model. For example,

What do you want to run? DTI or QSAR: DTI

IMPORTANT TIPS:

To stop the process, use Ctrl + C and make kill commands.
The experiments on the full datasets require a large amount of memory. We have tested on a machine with 96G memory.
The PrivPy framework requires the Ubuntu 18 environment.

Check the results

The results will be printed during running. Or you can check the results in the result directory after running.

For the DTI task, the predicted DTI scores will be in result/Re.txt and the AUPR and AUROC for the three settings (i.e., on 1:1 positive and negative samples, 1:10 positive and negative samples and all samples) will be in result/metrics.txt.

For the QSAR task, the predicted bioactivities for the testing data will be in result/ypred_result.txt and the squared Pearson correlation coefficient will be in result/r2_result.txt.

Contacts

If you have any questions or comments, please feel free to email Rong Ma (ma-r17@mails.tsinghua.edu.cn) and/or Jianyang Zeng (zengjy321@tsinghua.edu.cn).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

conf

conf

mydata

mydata

prep

prep

result

result

Makefile

Makefile

README.md

README.md

Repository files navigation

QSARMPC_DTIMPC

Install

Datasets

Set configuration files

Run

Check the results

Contacts

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
code		code
conf		conf
mydata		mydata
prep		prep
result		result
Makefile		Makefile
README.md		README.md

rongma6/QSARMPC_DTIMPC

Folders and files

Latest commit

History

Repository files navigation

QSARMPC_DTIMPC

Install

Datasets

Set configuration files

Run

Check the results

Contacts

About

Topics

Resources

Stars

Watchers

Forks

Languages