General

Experimental deep learning architecture for scoring protein-protein interactions.

See PointNet paper for original architecture description. This implementation contains two architectures, neither of which contain the transformer networks, so can be considered variants of the vanilla version of PointNet. The first differs merely in its dropout rate (50%), whereas the second is a novel architecture called Siamese PointNet, visible in the image below.

Other adaptations include cosine annealing learning rate decay, which has been implemented to improve accuracy and generalizability of the trained network (see Stochastic Gradient Descent with Warm Restarts), and a custom loss function introducing a bias in learning towards higher scoring decoys.

Dependencies

Python 3.x
H5Py for fast data retrieval
PyTorch <0.4 and its dependencies
Data conversion uses DeepRank and its dependencies
Seaborn for plotting

Usage

python train.py

  --batch_size BATCH_SIZE   Input batch size (default = 256)
  --num_points NUM_POINTS   Points per point cloud used (default = 1024)
  --num_epoch NUM_EPOCH     Number of epochs to train for (default = 15)
  --CUDA                    Train on GPU
  --out_folder OUT_FOLDER   Model output folder
  --model MODEL             Model input path
  --data_path DATA_PATH     Path to HDF5 file
  --lr LR                   Learning rate (default = 0.0001)
  --optimizer OPTIMIZER     What optimizer to use. Options: Adam, SGD, SGD_cos
  --avg_pool                Use average pooling for feature pooling (instead of default max pooling)
  --dual                    Use Siamese PointNet architecture
  --metric METRIC           Metric to be used. Options: irmsd, lrmsd, fnat, dockQ (default)
  --dropout DROPOUT         Dropout rate in last layer. When 0 replaced by batchnorm (default = 0.5)
  --root                    Apply square root on metric (for DockQ score balancing)
  --patience PATIENCE       Number of epochs to observe overfitting before early stopping
  --classification          Classification instead of regression

The network takes the atoms taking part in an interaction as point cloud data. Data conversion can be performed using the extract_pc.py script.

Data is saved in HDF5 format containing 3 groups: train, test and "holdout" data. Datasets within these groups contain atom features with float32 precision and attributes containing the iRMSD, lRMSD, FNAT, and DockQ scores.

Current state

Architecture & training scripts have been fully implemented

Name		Name	Last commit message	Last commit date
Latest commit History 256 Commits
doc		doc
scripts		scripts
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

scripts

scripts

tools

tools

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

General

Dependencies

Usage

Current state

About

Releases

Packages

Contributors 2

Languages

License

DeepRank/PoNDeR

Folders and files

Latest commit

History

Repository files navigation

General

Dependencies

Usage

Current state

About

Topics

Resources

License

Stars

Watchers

Forks

Languages