Skip to content

PyTorch implementation of Protein Dynamically Activated Residues (ProDAR) for dyamics-informed protein function prediction/annotation

Notifications You must be signed in to change notification settings

chiang-yuan/ProDAR

Repository files navigation

ProDAR

ProDAR enhances protien function prediction and extracts Dynamically Activated Residues (DARs) using the dynamical information obtained from normal mode analysis (NMA). The code is published with Encoding protein dynamic information in graph representation for functional residue identification.

[arXiv] [CRPS]

Hierarchy

├── data
│   ├── data-graphs.ipynb
│   ├── data-graphs.py
│   ├── data-sifts.ipynb
│   ├── data-sifts.py
│   ├── graphs-10A
│   ├── nma-anm
│   ├── pdbs
│   ├── pis
│   └── sifts
│       ├── mf_go_codes-allcnt.dat
│       ├── mf_go_codes-thres-50.dat
│       ├── mf_go_codes-thres-50.npy
│       ├── pdb_chains.dat
│       ├── pdbmfgos-thres-50.json
│       ├── sifts-err-1.log
│       └── sifts-err-2.log
├── datasets
│   └── dataset.py
├── evaluation_kfold.py
├── experiment_kfold.py
├── models
│   └── multilabel_classifiers
│       ├── GAT.py
│       ├── GCN.py
│       └── GraphSAGE.py
├── prodar-env.yml
└── prodar.py

Environment

  1. Clone environment from prodar-env.yml using miniconda:
conda env create -f environment.yml
  1. Install PyG package via pip wheel:
pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric

where ${TORCH} and ${CUDA} should be repalced by the PyTorch and CUDA version (TORCH=1.10.0 and CUDA=cu113 for this specific environment).

  1. Extra packages (if not installed by previous steps) may be installed via pip wheel.

Data

To preprocess data and generate protein graphs, execute the first script to download raw data from RCSB PDB search API and PDBe SIFTS API, and execute the second script to export filtered PDB and GO entries as JSON graphs.

  1. Execute data-sifts.py
python data-sifts.py
  1. Execute data-graphs.py
python data-graphs.py

For the above two steps, *.ipynb files are provided for markdown and optional visualization when jupyter lab/notebook is used.

Run

Experiment (currently only k-fold cross validation)

python experiment_kfold.py <options>

Evaluation (currently execute all saved models in history/)

python evaluation_kfold.py

Citing

If you happen to use the scripts, analyses, models, results or partial snippet of this work and find it useful, please cite the associated paper

@article{chiang2022encoding,
  title={Encoding protein dynamic information in graph representation for functional residue identification},
  author={Chiang, Yuan and Hui, Wei-Han and Chang, Shu-Wei},
  journal={Cell Reports Physical Science},
  volume={3},
  number={7},
  pages={100975},
  year={2022},
  publisher={Elsevier}
}

License

TBD

About

PyTorch implementation of Protein Dynamically Activated Residues (ProDAR) for dyamics-informed protein function prediction/annotation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published