Subpred4

Setup (tested on Ubuntu 22.04 LTS)

Clone repository
Download subpred4_data.tar.gz and place in repository folder

OneDrive download link (~50GB)
Extract raw data
```
make data_import
```

Install Mambaforge

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
source ~/.bashrc

Recreate exact conda environment (use environment_history.yml instead if there is an error, which can happen on different OS)
```
mamba env create --file environment.yml
```
Activate conda environment
```
conda activate subpred4
```
Install code as python package into environment
```
make package
```
Create BLAST databases (Needs >100GB of space and takes several hours, pre-computed pssms are availabe in data/intermediate)
```
make blast_databases
```
Run the notebooks in order, according to their filenames

API concept

All raw data is left untouched in data/raw. The download commands and versions can be found in the preprocessing notebook. All files are based on the same version of Uniprot (2022_05). Re-downloading the raw data using the same commands can upgrade them to the latest version, but that can lead to incompatibilities, since not all databases based on a particular Uniprot version are released at the same time.

Preprocessing is performed on the raw data, then the processed data is saved as pickles in data/datasets for fast i/o. The method subpred.util.load_df can be used to read these pickles.

A transporter dataset can be created manually with all parameters using the methods in subpred.protein_dataset, subpred.go_annotations and subpred.chebi_annotations. This process is simplified through the function subpred.transmembrane_transporters.get_transmembrane_transporter_dataset, which sets most of the parameters.

The function get_transmembrane_transporter_dataset returns three dataframes: One with sequences, one with GO annotations, and one with ChEBI annotations. These three dataframes essentially act like data classes. All of the remaining methods in the package take one or multiple of these dataframes as input to carry out their calculations, and the data should ideally not be changed before using the methods on them.

Name		Name	Last commit message	Last commit date
Latest commit History 746 Commits
data		data
notebooks		notebooks
subpred		subpred
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
data_backup_list.txt		data_backup_list.txt
environment.yml		environment.yml
environment_history.yml		environment_history.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

notebooks

notebooks

subpred

subpred

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

data_backup_list.txt

data_backup_list.txt

environment.yml

environment.yml

environment_history.yml

environment_history.yml

setup.py

setup.py

Repository files navigation

Subpred4

Setup (tested on Ubuntu 22.04 LTS)

API concept

About

Releases

Packages

Languages

License

adenger/subpred4

Folders and files

Latest commit

History

Repository files navigation

Subpred4

Setup (tested on Ubuntu 22.04 LTS)

API concept

About

Topics

Resources

License

Stars

Watchers

Forks

Languages